Modern decision trees in Python
Project description
scikit-tree
scikit-tree is a scikit-learn compatible API for building state-of-the-art decision trees. These include unsupervised trees, oblique trees, uncertainty trees, quantile trees and causal trees.
Tree-models have withstood the test of time, and are consistently used for modern-day data science and machine learning applications. They especially perform well when there are limited samples for a problem and are flexible learners that can be applied to a wide variety of different settings, such as tabular, images, time-series, genomics, EEG data and more.
We welcome contributions for modern tree-based algorithms. We use Cython to achieve fast C/C++ speeds, while abiding by a scikit-learn compatible (tested) API. Moreover, our Cython internals are easily extensible because they follow the internal Cython API of scikit-learn as well.
Submodule dependency on a fork of scikit-learn
Due to the current state of scikit-learn's internal Cython code for trees, we have to instead leverage a maintained fork of scikit-learn at https://github.com/neurodata/scikit-learn, where specifically, the fork
branch is used to build and install this repo. We keep that fork well-maintained and up-to-date with respect to the main sklearn repo. The only difference is the refactoring of the tree/
submodule. This fork is used internally under the namespace sktree._lib.sklearn
. It is necessary to use this fork for anything related to:
RandomForest*
ExtraTrees*
- or any importable items from the
tree/
submodule, whether it is a Cython or Python object
If you are developing for scikit-tree, we will always depend on the most up-to-date commit of https://github.com/neurodata/scikit-learn/submodulev2
as a submodule within scikit-tee. This branch is consistently maintained for changes upstream that occur in the scikit-learn tree submodule. This ensures that our fork maintains consistency and robustness due to bug fixes and improvements upstream.
Documentation
See here for the documentation for our dev version: https://docs.neurodata.io/scikit-tree/dev/index.html
Why oblique trees and why trees beyond those in scikit-learn?
In 2001, Leo Breiman proposed two types of Random Forests. One was known as Forest-RI
, which is the axis-aligned traditional random forest. One was known as Forest-RC
, which is the random oblique linear combinations random forest. This leveraged random combinations of features to perform splits. MORF builds upon Forest-RC
by proposing additional functions to combine features. Other modern tree variants such as Canonical Correlation Forests (CCF), or unsupervised random forests are also important at solving real-world problems using robust decision tree models.
Installation
Our installation will try to follow scikit-learn installation as close as possible, as we contain Cython code subclassed, or inspired by the scikit-learn tree submodule.
AS OF NOW, scikit-tree is in development stage and the installation is still finicky due to the upstream scikit-learn's stalled refactoring PRs of the tree submodule. Once those are merged, the installation will be simpler. The current recommended installation is done locally with meson.
Dependencies
We minimally require:
* Python (>=3.8)
* numpy
* scipy
* scikit-learn >= 1.3
Installation with Pip
pip install sktree
Building locally with Meson (RECOMMENDED)
Make sure you have the necessary packages installed
# install build dependencies
pip install numpy scipy meson ninja meson-python Cython scikit-learn scikit-learn-tree
# you may need these optional dependencies to build scikit-learn locally
conda install -c conda-forge joblib threadpoolctl pytest compilers llvm-openmp
We use the spin
CLI to abstract away build details:
# run the build using Meson/Ninja
./spin build
# you can run the following command to see what other options there are
./spin --help
./spin build --help
# For example, you might want to start from a clean build
./spin build --clean
# or build in parallel for faster builds
./spin build -j 2
# you will need to double check the build-install has the proper path
# this might be different from machine to machine
export PYTHONPATH=${PWD}/build-install/usr/lib/python3.9/site-packages
# run specific unit tests
./spin test -- sktree/tree/tests/test_tree.py
# you can bring up the CLI menu
./spin --help
You can also do the same thing using Meson/Ninja itself. Run the following to build the local files:
# generate ninja make files
meson build --prefix=$PWD/build
# compile
ninja -C build
# install scikit-tree package
meson install -C build
export PYTHONPATH=${PWD}/build/lib/python3.9/site-packages
# to check installation, you need to be in a different directory
cd docs;
python -c "from sktree import tree"
python -c "import sklearn; print(sklearn.__version__);"
Alternatively, you can use editable installs
pip install --no-build-isolation --editable .
References
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for scikit_tree-0.1.4-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 469d4a4ca80a09c52aa0c2242ab9678140547c81d58b721fe3379ead2e891842 |
|
MD5 | d489b3a40f156c2f15f1082cb55d9c83 |
|
BLAKE2b-256 | 31d4232a72270e8cabdc20f941068932d06aff9003b99ec681722030f6430960 |
Hashes for scikit_tree-0.1.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e25e90d5f21e12a8cffca9d12f0831ee2d50a938bb262d9a8837639615b16e46 |
|
MD5 | b038a5ba2742bb5b4e245968a92cb0d5 |
|
BLAKE2b-256 | fe6a6d80355e846224b7a8555b295feb481a22dd20cb0e029b512ef20ac5da47 |
Hashes for scikit_tree-0.1.4-cp311-cp311-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 093a7c9dba2c308724aeb1b1a5c2174053a1afaceec1ab3adeb8c20752b45ab9 |
|
MD5 | d78841d6a003dd13143c21cdc06e6e30 |
|
BLAKE2b-256 | 79433574afb73f9420b4a24adaccc2af763fefd9ef094aabd05d7bf4ed587919 |
Hashes for scikit_tree-0.1.4-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d2ef213327fa06861b61ec2068184965b9cf84a5f2263ab32c119197df3f939d |
|
MD5 | a19400941e3a650c4ae3727aae5092e4 |
|
BLAKE2b-256 | 5e85bb44c333d768c63e847101bbb1ecff01990a788a00211c26eeda173d4b4f |
Hashes for scikit_tree-0.1.4-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ebde8f48d1d1ced4e8b79d3ead4621f3d5bc53112a17deda6089b39459879f91 |
|
MD5 | b1c6585949a3663ec096364f3f0d68fb |
|
BLAKE2b-256 | 916c41fab4f9ecbe84ae9ddf80b450955fc673bb468cae67aca12269c34dbced |
Hashes for scikit_tree-0.1.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0f5b07c93306d7d627d732b0f7f85d9ebc7cf5bf78db0f8ed44c89d613e87d43 |
|
MD5 | 2990f9f20bc5de6b9e75df009e575d94 |
|
BLAKE2b-256 | 2a942cd73da2065384ad11dceda89e90fed50ce0f73c9ecaa0664d346545c52a |
Hashes for scikit_tree-0.1.4-cp310-cp310-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b9a3ea5d539436f163b5f4376249b3971c968415097aadbd14412460f2b53400 |
|
MD5 | c844b050a4f218df37c5005ac67cb1f3 |
|
BLAKE2b-256 | 9310a3780eef5d4deb049867e9377416c811fe9457aeb9d46b72064b96ed1f52 |
Hashes for scikit_tree-0.1.4-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 150be0bb92a83a8fcae9c2578e80df4faa25ea230413a5269b4281180b9c0aa2 |
|
MD5 | 181a488e3df8bdf166341a2c453f4a48 |
|
BLAKE2b-256 | a4a5fbe5068627958f50c53c4b4ac1bc4c396e3f927a6a3aec0224a0b17ad928 |
Hashes for scikit_tree-0.1.4-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c989c5b0c7462d0fed9ba8df14167a0177b7247def2534bdad971a2f68246079 |
|
MD5 | eb9c7b30e7668e7b4a72022fb1114184 |
|
BLAKE2b-256 | 6413cfc482b80f689a47053a9974c2f722479a87249381292c79a31f85df5768 |
Hashes for scikit_tree-0.1.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9131b9e03e6e2845b6721bd2d331b77b324c3b513cb439a3bb17c66eaa66cd2f |
|
MD5 | ee03f61e8940101bf5f0784e0be837d1 |
|
BLAKE2b-256 | 807d3d8802fcf7670c4dfa9c497b1cda6fa6ff3147063b3976aea72e81e4e222 |
Hashes for scikit_tree-0.1.4-cp39-cp39-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a6639f4b7ddf0b6d982323361958c6d7811a066b555be2699ff63960e6209d9b |
|
MD5 | 2c4c691b63d68978f69fa49e651b7ec7 |
|
BLAKE2b-256 | 824b802e5c8cac806021a22c40a25d65288978880ca6ee757408df0684d18834 |
Hashes for scikit_tree-0.1.4-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 35f8f76845e466fa5bef37b98a48e9c16fcbe578ce55756fb473022f00356a8f |
|
MD5 | 4939159c98c0c7c80a187460e7463902 |
|
BLAKE2b-256 | 96bcf5c0e9fea176d2faadb9af8cc30b80ba31ff59bfe150ca32c7e1bac03a6e |