Skip to main content

Modern decision trees in Python

Project description

Code style: black CircleCI Main Checked with mypy codecov PyPI Download count Latest PyPI release DOI

scikit-tree

scikit-tree is a scikit-learn compatible API for building state-of-the-art decision trees. These include unsupervised trees, oblique trees, uncertainty trees, quantile trees and causal trees.

Tree-models have withstood the test of time, and are consistently used for modern-day data science and machine learning applications. They especially perform well when there are limited samples for a problem and are flexible learners that can be applied to a wide variety of different settings, such as tabular, images, time-series, genomics, EEG data and more.

Documentation

See here for the documentation for our dev version: https://docs.neurodata.io/scikit-tree/dev/index.html

Why oblique trees and why trees beyond those in scikit-learn?

In 2001, Leo Breiman proposed two types of Random Forests. One was known as Forest-RI, which is the axis-aligned traditional random forest. One was known as Forest-RC, which is the random oblique linear combinations random forest. This leveraged random combinations of features to perform splits. MORF builds upon Forest-RC by proposing additional functions to combine features. Other modern tree variants such as Canonical Correlation Forests (CCF), Extended Isolation Forests, Quantile Forests, or unsupervised random forests are also important at solving real-world problems using robust decision tree models.

Installation

Our installation will try to follow scikit-learn installation as close as possible, as we contain Cython code subclassed, or inspired by the scikit-learn tree submodule.

Dependencies

We minimally require:

* Python (>=3.9)
* numpy
* scipy
* scikit-learn >= 1.3

Installation with Pip (https://pypi-hypernode.com/project/scikit-tree/)

Installing with pip on a conda environment is the recommended route.

pip install scikit-tree

Building locally with Meson (For developers)

Make sure you have the necessary packages installed

# install build dependencies
pip install -r build_requirements.txt

# you may need these optional dependencies to build scikit-learn locally
conda install -c conda-forge joblib threadpoolctl pytest compilers llvm-openmp

We use the spin CLI to abstract away build details:

# run the build using Meson/Ninja
./spin build

# you can run the following command to see what other options there are
./spin --help
./spin build --help

# For example, you might want to start from a clean build
./spin build --clean

# or build in parallel for faster builds
./spin build -j 2

# you will need to double check the build-install has the proper path
# this might be different from machine to machine
export PYTHONPATH=${PWD}/build-install/usr/lib/python3.9/site-packages

# run specific unit tests
./spin test -- sktree/tree/tests/test_tree.py

# you can bring up the CLI menu
./spin --help

You can also do the same thing using Meson/Ninja itself. Run the following to build the local files:

# generate ninja make files
meson build --prefix=$PWD/build

# compile
ninja -C build

# install scikit-tree package
meson install -C build

export PYTHONPATH=${PWD}/build/lib/python3.9/site-packages

# to check installation, you need to be in a different directory
cd docs;  
python -c "from sktree import tree"
python -c "import sklearn; print(sklearn.__version__);"

After building locally, you can use editable installs (warning: this only registers Python changes locally)

pip install --no-build-isolation --editable .

Or if you have spin v0.8+ installed, you can just run directly

spin install

Development

We welcome contributions for modern tree-based algorithms. We use Cython to achieve fast C/C++ speeds, while abiding by a scikit-learn compatible (tested) API. Moreover, our Cython internals are easily extensible because they follow the internal Cython API of scikit-learn as well.

Due to the current state of scikit-learn's internal Cython code for trees, we have to instead leverage a fork of scikit-learn at https://github.com/neurodata/scikit-learn when extending the decision tree model API of scikit-learn. Specifically, we extend the Python and Cython API of the tree submodule in scikit-learn in our submodule, so we can introduce the tree models housed in this package. Thus these extend the functionality of decision-tree based models in a way that is not possible yet in scikit-learn itself. As one example, we introduce an abstract API to allow users to implement their own oblique splits. Our plan in the future is to benchmark these functionalities and introduce them upstream to scikit-learn where applicable and inclusion criterion are met.

References

[1]: Li, Adam, et al. "Manifold Oblique Random Forests: Towards Closing the Gap on Convolutional Deep Networks" SIAM Journal on Mathematics of Data Science, 5(1), 77-96, 2023

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit_tree-0.4.1.tar.gz (16.2 MB view details)

Uploaded Source

Built Distributions

scikit_tree-0.4.1-cp311-cp311-win_amd64.whl (5.0 MB view details)

Uploaded CPython 3.11 Windows x86-64

scikit_tree-0.4.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

scikit_tree-0.4.1-cp311-cp311-macosx_11_0_arm64.whl (2.0 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

scikit_tree-0.4.1-cp311-cp311-macosx_10_9_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.11 macOS 10.9+ x86-64

scikit_tree-0.4.1-cp310-cp310-win_amd64.whl (5.0 MB view details)

Uploaded CPython 3.10 Windows x86-64

scikit_tree-0.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

scikit_tree-0.4.1-cp310-cp310-macosx_11_0_arm64.whl (2.0 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

scikit_tree-0.4.1-cp310-cp310-macosx_10_9_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

scikit_tree-0.4.1-cp39-cp39-win_amd64.whl (5.0 MB view details)

Uploaded CPython 3.9 Windows x86-64

scikit_tree-0.4.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

scikit_tree-0.4.1-cp39-cp39-macosx_11_0_arm64.whl (2.0 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

scikit_tree-0.4.1-cp39-cp39-macosx_10_9_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

File details

Details for the file scikit_tree-0.4.1.tar.gz.

File metadata

  • Download URL: scikit_tree-0.4.1.tar.gz
  • Upload date:
  • Size: 16.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for scikit_tree-0.4.1.tar.gz
Algorithm Hash digest
SHA256 dd254dbafc742bef1b6c5cf22fac4488a66ee5df492df524258fdc378dada83e
MD5 f60aabc3c7e04e7e8b9116072c241783
BLAKE2b-256 4a20349b77c5f6326b50902774164068ad8926af2430cc32d1d40ed352ad0773

See more details on using hashes here.

File details

Details for the file scikit_tree-0.4.1-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.4.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 4bfb7bb6c14cd883ef5ce8cd14636eaa8420e879722e0a11135491c201195c85
MD5 bd12cde814a6fb4554243ebc8d4c887a
BLAKE2b-256 461a3c0455578b1dfe713acdfc871202f54c6c4b6bf600eaa5fb5dde5161e1ae

See more details on using hashes here.

File details

Details for the file scikit_tree-0.4.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.4.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 512bc28c6a9de0e5f442c9c4c630e48ac805c928a09c2dc55e44ab2bcc7de722
MD5 4e21bd52764655d9e6ded0c01d1413e9
BLAKE2b-256 1818d75535ebd4df1e5b26c7f6932fb23b641366cada4d53d983adcdb0fc0811

See more details on using hashes here.

File details

Details for the file scikit_tree-0.4.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.4.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0db941548a9523ae124a13a3c086477f39347f6f03f23fc62cb64f82733c4fe2
MD5 28f0cd34faaa913c07e790aaded0b206
BLAKE2b-256 b7de4a887cfe2fa844c60ee22d5bfaa24975dc1ae7a86bbdeff0db0ecb238c7a

See more details on using hashes here.

File details

Details for the file scikit_tree-0.4.1-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.4.1-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 fb8882a847fbf4d34a90ad927e0bb9d7d71215f9eddfda4403c9b5f395cab5f4
MD5 f31808bb8c1e1a9b83018da41969b3d9
BLAKE2b-256 0e1dd5d9058a4c3b4be60a08193e42f14d7172066a0e4cafd74397b861d09555

See more details on using hashes here.

File details

Details for the file scikit_tree-0.4.1-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.4.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 43bf7a94c0d705acd0aa8e05336bd0cbfd614e9f6573f2d61f6a1376bbcc5a66
MD5 9d897d8313b658eddabd2fa2489084ef
BLAKE2b-256 749a8a0223828412565963f2e4e000f14bcb942267c4bafe2a4877656f48fa21

See more details on using hashes here.

File details

Details for the file scikit_tree-0.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d30056e25d1a45c6c0c565d94cab8a32316954e361585c58640331eff4c4d232
MD5 19adda6e6d8eac329540bfcc69f0a2ae
BLAKE2b-256 9f205cf37af80dc1de7ed0dc6e3c00b32d1c8682253982f1e2cef21fc2fbe493

See more details on using hashes here.

File details

Details for the file scikit_tree-0.4.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.4.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 877c9503cad84554e7d29453169abeafb897e498a003c96b131932a90cb662e9
MD5 fc56d61ff43b68440dbcb826d61a01be
BLAKE2b-256 045a1e43223c399f508fffeefa0f9135dfa2bdc6d29e9f7190c71501b76cc019

See more details on using hashes here.

File details

Details for the file scikit_tree-0.4.1-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.4.1-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 d806304bfdccda9e8d6353b3ee43c66160ce5333160f2c3ba5b13006d451c383
MD5 d0c659e55d93f439d021dcbb9c3a3fa2
BLAKE2b-256 5de25f482e21f5dea238af983e2c005891fb7a12e4cce29d1929971421f25d5b

See more details on using hashes here.

File details

Details for the file scikit_tree-0.4.1-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.4.1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 8c1ea4ad91805ef25490dd312a48c821b6ea26a4967520ff80fe26d2a33d8f1a
MD5 56584a0ee5d6b3bcb1f2202de7375552
BLAKE2b-256 f16373830a13e1fe95e291d8a4b8a9b3e4cc814955d6869f90e3c70b95bfb125

See more details on using hashes here.

File details

Details for the file scikit_tree-0.4.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.4.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8542d60070a36cbf07481f7060b9fd9bc4a516cf33f71d500c2494b6c657d75b
MD5 5e2175379d0b6973bd62098830d80b8f
BLAKE2b-256 03fb09b07bf76e4a295bbc04e073d91274650e89222df20e57efe14cd52af1f4

See more details on using hashes here.

File details

Details for the file scikit_tree-0.4.1-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.4.1-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a02743578cbbfb03c2c79c58b6b638b4e9dd34ae711313af7092fc5c177bf485
MD5 38961b2d595b4c08bca96b5ebcc5e0a9
BLAKE2b-256 cbc914229e7621a4c2702e31f5ce6221476a342172b57f68c9280dd8da6555ef

See more details on using hashes here.

File details

Details for the file scikit_tree-0.4.1-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.4.1-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 bbef9d004953b7e45dffcb76c02678ed482f2cb45eae443b59f87e7ed7f12507
MD5 1f4adb51d1e3dcf5bf3098b2b2b7e5fc
BLAKE2b-256 187e226bc5713166904b267087636f741be77a81aaa9676e20e4a4a6467fef8c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page