Skip to main content

Modern decision trees in Python

Project description

Code style: black CircleCI Main Checked with mypy codecov PyPI Download count Latest PyPI release DOI

scikit-tree

scikit-tree is a scikit-learn compatible API for building state-of-the-art decision trees. These include unsupervised trees, oblique trees, uncertainty trees, quantile trees and causal trees.

Tree-models have withstood the test of time, and are consistently used for modern-day data science and machine learning applications. They especially perform well when there are limited samples for a problem and are flexible learners that can be applied to a wide variety of different settings, such as tabular, images, time-series, genomics, EEG data and more.

Documentation

See here for the documentation for our dev version: https://docs.neurodata.io/scikit-tree/dev/index.html

Why oblique trees and why trees beyond those in scikit-learn?

In 2001, Leo Breiman proposed two types of Random Forests. One was known as Forest-RI, which is the axis-aligned traditional random forest. One was known as Forest-RC, which is the random oblique linear combinations random forest. This leveraged random combinations of features to perform splits. MORF builds upon Forest-RC by proposing additional functions to combine features. Other modern tree variants such as Canonical Correlation Forests (CCF), Extended Isolation Forests, Quantile Forests, or unsupervised random forests are also important at solving real-world problems using robust decision tree models.

Installation

Our installation will try to follow scikit-learn installation as close as possible, as we contain Cython code subclassed, or inspired by the scikit-learn tree submodule.

Dependencies

We minimally require:

* Python (>=3.9)
* numpy
* scipy
* scikit-learn >= 1.3

Installation with Pip (https://pypi-hypernode.com/project/scikit-tree/)

Installing with pip on a conda environment is the recommended route.

pip install scikit-tree

Building locally with Meson (For developers)

Make sure you have the necessary packages installed

# install build dependencies
pip install -r build_requirements.txt

# you may need these optional dependencies to build scikit-learn locally
conda install -c conda-forge joblib threadpoolctl pytest compilers llvm-openmp

We use the spin CLI to abstract away build details:

# run the build using Meson/Ninja
./spin build

# you can run the following command to see what other options there are
./spin --help
./spin build --help

# For example, you might want to start from a clean build
./spin build --clean

# or build in parallel for faster builds
./spin build -j 2

# you will need to double check the build-install has the proper path
# this might be different from machine to machine
export PYTHONPATH=${PWD}/build-install/usr/lib/python3.9/site-packages

# run specific unit tests
./spin test -- sktree/tree/tests/test_tree.py

# you can bring up the CLI menu
./spin --help

You can also do the same thing using Meson/Ninja itself. Run the following to build the local files:

# generate ninja make files
meson build --prefix=$PWD/build

# compile
ninja -C build

# install scikit-tree package
meson install -C build

export PYTHONPATH=${PWD}/build/lib/python3.9/site-packages

# to check installation, you need to be in a different directory
cd docs;  
python -c "from sktree import tree"
python -c "import sklearn; print(sklearn.__version__);"

After building locally, you can use editable installs (warning: this only registers Python changes locally)

pip install --no-build-isolation --editable .

Or if you have spin v0.8+ installed, you can just run directly

spin install

Development

We welcome contributions for modern tree-based algorithms. We use Cython to achieve fast C/C++ speeds, while abiding by a scikit-learn compatible (tested) API. Moreover, our Cython internals are easily extensible because they follow the internal Cython API of scikit-learn as well.

Due to the current state of scikit-learn's internal Cython code for trees, we have to instead leverage a fork of scikit-learn at https://github.com/neurodata/scikit-learn when extending the decision tree model API of scikit-learn. Specifically, we extend the Python and Cython API of the tree submodule in scikit-learn in our submodule, so we can introduce the tree models housed in this package. Thus these extend the functionality of decision-tree based models in a way that is not possible yet in scikit-learn itself. As one example, we introduce an abstract API to allow users to implement their own oblique splits. Our plan in the future is to benchmark these functionalities and introduce them upstream to scikit-learn where applicable and inclusion criterion are met.

References

[1]: Li, Adam, et al. "Manifold Oblique Random Forests: Towards Closing the Gap on Convolutional Deep Networks" SIAM Journal on Mathematics of Data Science, 5(1), 77-96, 2023

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit_tree-0.4.0.tar.gz (14.4 MB view details)

Uploaded Source

Built Distributions

scikit_tree-0.4.0-cp311-cp311-win_amd64.whl (5.0 MB view details)

Uploaded CPython 3.11 Windows x86-64

scikit_tree-0.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

scikit_tree-0.4.0-cp311-cp311-macosx_11_0_arm64.whl (2.0 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

scikit_tree-0.4.0-cp311-cp311-macosx_10_9_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.11 macOS 10.9+ x86-64

scikit_tree-0.4.0-cp310-cp310-win_amd64.whl (5.0 MB view details)

Uploaded CPython 3.10 Windows x86-64

scikit_tree-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

scikit_tree-0.4.0-cp310-cp310-macosx_11_0_arm64.whl (2.0 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

scikit_tree-0.4.0-cp310-cp310-macosx_10_9_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

scikit_tree-0.4.0-cp39-cp39-win_amd64.whl (5.0 MB view details)

Uploaded CPython 3.9 Windows x86-64

scikit_tree-0.4.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

scikit_tree-0.4.0-cp39-cp39-macosx_11_0_arm64.whl (2.0 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

scikit_tree-0.4.0-cp39-cp39-macosx_10_9_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

File details

Details for the file scikit_tree-0.4.0.tar.gz.

File metadata

  • Download URL: scikit_tree-0.4.0.tar.gz
  • Upload date:
  • Size: 14.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for scikit_tree-0.4.0.tar.gz
Algorithm Hash digest
SHA256 8dbf7859c3f74d798849d48c8a27d2ca747555536279ac67ddad9c80648212c4
MD5 c07a2a9f81000aef0dbead7fe1771348
BLAKE2b-256 b35420e7f1f4e7f2681389808695f3ff9115295f8abc5b7e71d2e151190428f3

See more details on using hashes here.

File details

Details for the file scikit_tree-0.4.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.4.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 982d7a69038e865fb296215da5012c677ef02de63d4c183dfaf2d5446cb7fb7f
MD5 c5d52a37be97b415d99b7983ea092518
BLAKE2b-256 77cb910080818d7d3b215b1e15339adf94832f8a505eeb60283a8ad7b6aee8c1

See more details on using hashes here.

File details

Details for the file scikit_tree-0.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 72a840aa61f0b29382ec1ed4a18c316a409c3660ab59438b669b2785d4ff66cb
MD5 49dd87a1bf3f22afc1a04089974d6c30
BLAKE2b-256 00744b455d8dfd46e9e90f29cf147875ecf0aeb6e38fb6571a9d7fb93b48a7df

See more details on using hashes here.

File details

Details for the file scikit_tree-0.4.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.4.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f83d09fe67d7b5a4aceb3e32fae7856e531676b11993b12950d4cc033d15917a
MD5 eff9d038ea4ef8df274eef2dc455eb3b
BLAKE2b-256 0d8a380a1f18e77d6c2b73dacd54805ea39bb28cd8d5c7e2ddb40d0de5e7ea2e

See more details on using hashes here.

File details

Details for the file scikit_tree-0.4.0-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.4.0-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 550baf405a7b4cdaff250c29f4ba51bcf2fb38ee7ae8dfa11020eed3cf8ec724
MD5 378a72f414ca691aef6baaace9e8e0c0
BLAKE2b-256 207a99ad87ecd65a40627c6e0f63fd95e5dc414300e5a685cf07263cb707f5f9

See more details on using hashes here.

File details

Details for the file scikit_tree-0.4.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.4.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 b83a6adcb1a315f6747918fc1ba0e73d0f2f790717afcd70cbf3b64e1044d943
MD5 0e55c72ae41085781d85e8fe5ebf9abb
BLAKE2b-256 e9b703e909643d1c5546ec1b96344bfdb87cc6ca6a5befe62990dd13a8b05f22

See more details on using hashes here.

File details

Details for the file scikit_tree-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 de27d556b4378899d531af63c5b94b4b9cb90cd22f240716e10fa0a59dd22b72
MD5 2490938d71122bc040fbf96c4e11e9de
BLAKE2b-256 3afe4eae346f0cfb9eee0af88c4f1d58353b631d0b5673697929ba3fe61b5d7f

See more details on using hashes here.

File details

Details for the file scikit_tree-0.4.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.4.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 42fd57b1e7e1fe24d7d64cf600ed6894167e613ea585bbaeaf55a37df4baa6f8
MD5 bed52fdff70a03bba5aa8e3ed6156299
BLAKE2b-256 7805e75665e0ff599b98dd84e81eedea8bb2a8900e14a669c035642019becc03

See more details on using hashes here.

File details

Details for the file scikit_tree-0.4.0-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.4.0-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 0dcca1da17b9c7f69b99bf683c1ebebc9ee3ce4aab384f9832f4e82d2478eea2
MD5 f59ec444c1833181eafb8227322e850d
BLAKE2b-256 1ae025d190756757ff70e5a13ad30c678e535365461dfc8c75d16c67294c0809

See more details on using hashes here.

File details

Details for the file scikit_tree-0.4.0-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.4.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 77bf9342790f163f1057e99c758f1b5be3d824e560abf3bef6fd9b0b112e28d8
MD5 32c76602486b2edc0b1628e83cc179a1
BLAKE2b-256 13eb4ff252a20c0ccc1414ce106a16e3e5b50e05a06eda669310f77101ef5350

See more details on using hashes here.

File details

Details for the file scikit_tree-0.4.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.4.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9e0bd7b89e0fff6fa8c51083e985a99bc7dc3622947bb2d530fea9b28dcf43b1
MD5 471fb4dc865ff5838f777bebb48ec652
BLAKE2b-256 815fedbcd6115b64c3cdd8fdbdfe005130e59c9d31c820a6dd5b9eed407a6ad7

See more details on using hashes here.

File details

Details for the file scikit_tree-0.4.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.4.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7f422f98170dda3d7eaa57bde8edb6713f6b8eb9777ef8765d53486bb1e6e51f
MD5 3a45e5cfc68d80df259563240038a573
BLAKE2b-256 d9ed34cb040143255e2e0c71133baefbc7410a261088a01973692117bd360c83

See more details on using hashes here.

File details

Details for the file scikit_tree-0.4.0-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.4.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 80ae9575db74d5cc9dbe5b77079e33cff126e79505fa62f2b0b9ee6e4fbc08a6
MD5 5ae3c0ab638e96e3efdba252a3086839
BLAKE2b-256 f4494b22093e4956e6b7a96f83845b7a8182867b8b29c4570046a664078e1790

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page