Skip to main content

Modern decision trees in Python

Project description

Code style: black CircleCI Main Checked with mypy codecov PyPI Download count Latest PyPI release DOI

scikit-tree

scikit-tree is a scikit-learn compatible API for building state-of-the-art decision trees. These include unsupervised trees, oblique trees, uncertainty trees, quantile trees and causal trees.

Tree-models have withstood the test of time, and are consistently used for modern-day data science and machine learning applications. They especially perform well when there are limited samples for a problem and are flexible learners that can be applied to a wide variety of different settings, such as tabular, images, time-series, genomics, EEG data and more.

Documentation

See here for the documentation for our dev version: https://docs.neurodata.io/scikit-tree/dev/index.html

Why oblique trees and why trees beyond those in scikit-learn?

In 2001, Leo Breiman proposed two types of Random Forests. One was known as Forest-RI, which is the axis-aligned traditional random forest. One was known as Forest-RC, which is the random oblique linear combinations random forest. This leveraged random combinations of features to perform splits. MORF builds upon Forest-RC by proposing additional functions to combine features. Other modern tree variants such as Canonical Correlation Forests (CCF), Extended Isolation Forests, Quantile Forests, or unsupervised random forests are also important at solving real-world problems using robust decision tree models.

Installation

Our installation will try to follow scikit-learn installation as close as possible, as we contain Cython code subclassed, or inspired by the scikit-learn tree submodule.

Dependencies

We minimally require:

* Python (>=3.9)
* numpy
* scipy
* scikit-learn >= 1.3

Installation with Pip (https://pypi-hypernode.com/project/scikit-tree/)

Installing with pip on a conda environment is the recommended route.

pip install scikit-tree

Building locally with Meson (For developers)

Make sure you have the necessary packages installed

# install build dependencies
pip install numpy scipy meson ninja meson-python Cython scikit-learn scikit-learn-tree

# you may need these optional dependencies to build scikit-learn locally
conda install -c conda-forge joblib threadpoolctl pytest compilers llvm-openmp

We use the spin CLI to abstract away build details:

# run the build using Meson/Ninja
./spin build

# you can run the following command to see what other options there are
./spin --help
./spin build --help

# For example, you might want to start from a clean build
./spin build --clean

# or build in parallel for faster builds
./spin build -j 2

# you will need to double check the build-install has the proper path
# this might be different from machine to machine
export PYTHONPATH=${PWD}/build-install/usr/lib/python3.9/site-packages

# run specific unit tests
./spin test -- sktree/tree/tests/test_tree.py

# you can bring up the CLI menu
./spin --help

You can also do the same thing using Meson/Ninja itself. Run the following to build the local files:

# generate ninja make files
meson build --prefix=$PWD/build

# compile
ninja -C build

# install scikit-tree package
meson install -C build

export PYTHONPATH=${PWD}/build/lib/python3.9/site-packages

# to check installation, you need to be in a different directory
cd docs;  
python -c "from sktree import tree"
python -c "import sklearn; print(sklearn.__version__);"

After building locally, you can use editable installs (warning: this only registers Python changes locally)

pip install --no-build-isolation --editable .

Development

We welcome contributions for modern tree-based algorithms. We use Cython to achieve fast C/C++ speeds, while abiding by a scikit-learn compatible (tested) API. Moreover, our Cython internals are easily extensible because they follow the internal Cython API of scikit-learn as well.

Due to the current state of scikit-learn's internal Cython code for trees, we have to instead leverage a fork of scikit-learn at https://github.com/neurodata/scikit-learn when extending the decision tree model API of scikit-learn. Specifically, we extend the Python and Cython API of the tree submodule in scikit-learn in our submodule, so we can introduce the tree models housed in this package. Thus these extend the functionality of decision-tree based models in a way that is not possible yet in scikit-learn itself. As one example, we introduce an abstract API to allow users to implement their own oblique splits. Our plan in the future is to benchmark these functionalities and introduce them upstream to scikit-learn where applicable and inclusion criterion are met.

References

[1]: Li, Adam, et al. "Manifold Oblique Random Forests: Towards Closing the Gap on Convolutional Deep Networks" SIAM Journal on Mathematics of Data Science, 5(1), 77-96, 2023

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit_tree-0.3.0.tar.gz (14.4 MB view details)

Uploaded Source

Built Distributions

scikit_tree-0.3.0-cp311-cp311-win_amd64.whl (5.0 MB view details)

Uploaded CPython 3.11 Windows x86-64

scikit_tree-0.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

scikit_tree-0.3.0-cp311-cp311-macosx_11_0_arm64.whl (2.0 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

scikit_tree-0.3.0-cp311-cp311-macosx_10_9_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.11 macOS 10.9+ x86-64

scikit_tree-0.3.0-cp310-cp310-win_amd64.whl (5.0 MB view details)

Uploaded CPython 3.10 Windows x86-64

scikit_tree-0.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

scikit_tree-0.3.0-cp310-cp310-macosx_11_0_arm64.whl (2.0 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

scikit_tree-0.3.0-cp310-cp310-macosx_10_9_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

scikit_tree-0.3.0-cp39-cp39-win_amd64.whl (5.0 MB view details)

Uploaded CPython 3.9 Windows x86-64

scikit_tree-0.3.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

scikit_tree-0.3.0-cp39-cp39-macosx_11_0_arm64.whl (2.0 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

scikit_tree-0.3.0-cp39-cp39-macosx_10_9_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

File details

Details for the file scikit_tree-0.3.0.tar.gz.

File metadata

  • Download URL: scikit_tree-0.3.0.tar.gz
  • Upload date:
  • Size: 14.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for scikit_tree-0.3.0.tar.gz
Algorithm Hash digest
SHA256 30ff34f5ba6c55762d5d6785d0cb63cf0c479c040b898dcfb3fc945fddbfe886
MD5 5c9e64df1701376531908d6dd0d549e0
BLAKE2b-256 deef260883a6e4597d55946404dd0146b9ca46fa68e3c57d216238991a41acdc

See more details on using hashes here.

File details

Details for the file scikit_tree-0.3.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.3.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 dd451f4c83985f657ac393cf0a4441764066c97608f2c297257910f6a5ac5f7f
MD5 ec39527f170340a8359b4e1b360009eb
BLAKE2b-256 49032293efb96c1309156c55231e9d899c0c073888e19f5efeea6dfc733e0bf7

See more details on using hashes here.

File details

Details for the file scikit_tree-0.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 12ec856b8ae191fa86f5444d729af93a96e706b153c88da7c18936c75d43f994
MD5 7347cc80acb0f2f5ff02e9817c3168a9
BLAKE2b-256 98e15048c5993de46a8ac647be1b66ad707dfaf1ba80129d7020dcb1c8794358

See more details on using hashes here.

File details

Details for the file scikit_tree-0.3.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.3.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 baeb2aa7b760bb024b4d91c6ca16f9541a7f3fd19ba69a5c551f594c5a8dccae
MD5 e0eba7ddd472433fa5f3e5c5c397f694
BLAKE2b-256 6611940c7e57cb0e23610b2d950096f8125d98fe2865be507b4a894a83292e9f

See more details on using hashes here.

File details

Details for the file scikit_tree-0.3.0-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.3.0-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 359b06928edd96e17be9766e4e771f3e9293c5f6532c4316bd4abdd38570cf69
MD5 d40ba580fe2350e4c925ddf528681fb6
BLAKE2b-256 992288faf5104a51cc1944fd9ecefd24e6ff32c657cdb4ba823328e32518ed2d

See more details on using hashes here.

File details

Details for the file scikit_tree-0.3.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.3.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 a530be7db041d16cbfec33bd9ad5157b5597ea43afcf6d0f1a6dda0c4b82b73a
MD5 db31baecf95843435bfcd3c01209800d
BLAKE2b-256 f9dd8f88742b4e299af35ac0888b1a75983d9dd58ed47e33da8d8fdbbfcca7e5

See more details on using hashes here.

File details

Details for the file scikit_tree-0.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0b4afd75a1b9916b748ec01bd99ab298042033f4ca9938fcfc7435214c4feaa5
MD5 178a9762f57366634475b6ab9692561e
BLAKE2b-256 619f179cf4617949b4a1cdd4a0c470327d87dd3d6c584b87e306b68daa8409e1

See more details on using hashes here.

File details

Details for the file scikit_tree-0.3.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.3.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 62d0604ff528506bab91d3873f1c6547a1c9030178e8fca61e71b89df5512713
MD5 61439f2495a3280fc39f4febf349f012
BLAKE2b-256 40957c4592fbfeb265e95fefd44fb9b7bd1a1d98f58bff11793d6340827b6d13

See more details on using hashes here.

File details

Details for the file scikit_tree-0.3.0-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.3.0-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 778feba82c9392f9293a6594b52d71ccb3ee22072f43c82da97db09f0ceb0e5d
MD5 f6a2b8ed45518fa0bc00ad2660ca4326
BLAKE2b-256 9723f23a84fdd1c080141222f89a02e69d15f48db44cc123e24cb94a968a6ec2

See more details on using hashes here.

File details

Details for the file scikit_tree-0.3.0-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.3.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 d79be39fd660f10be68ea9c4a45a5e461c53ad562568332b9f372f4e540df6e3
MD5 44641f44cd1277c50107a4f6fb4743ed
BLAKE2b-256 3c10f259dee5b87d91ac29db0624128a47d81b65b6b119945fb9ef8d85f980fb

See more details on using hashes here.

File details

Details for the file scikit_tree-0.3.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.3.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1dcef3870da8c92fe3d55dc424be68a59a464c85fb1309d3a25c6a8e86a419bb
MD5 35c4c33ec51617ff0e86c0b002b48749
BLAKE2b-256 5f28080f3495506ed13e7d387b8b38ef05bccf4db6b59ed819fb524bd33ef088

See more details on using hashes here.

File details

Details for the file scikit_tree-0.3.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.3.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 72f8179f1fee4b55c47969d8f5c7cceda7781b4bacc5c799987a5f40f6627a9a
MD5 e1e7638fbb228b0acd4224bcda70ab82
BLAKE2b-256 740f0976189073e19a317f4af23f07ee9d89bc9b769b1ff19cbeb0eae48ce26d

See more details on using hashes here.

File details

Details for the file scikit_tree-0.3.0-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.3.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 8849d59e04df6d7901f54a45101ba7bfd9cf89f17be3cfea04fda1cd8ffba06f
MD5 cbb73ecee38060a430157f40a93fe1e5
BLAKE2b-256 a271ec109029fbfc0f7572e53b55170d70829b6f952a96a2f2457e71fa8007f9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page