Skip to main content

Modern decision trees in Python

Project description

Code style: black CircleCI Main Checked with mypy codecov PyPI Download count Latest PyPI release

scikit-tree

scikit-tree is a scikit-learn compatible API for building state-of-the-art decision trees. These include unsupervised trees, oblique trees, uncertainty trees, quantile trees and causal trees.

Tree-models have withstood the test of time, and are consistently used for modern-day data science and machine learning applications. They especially perform well when there are limited samples for a problem and are flexible learners that can be applied to a wide variety of different settings, such as tabular, images, time-series, genomics, EEG data and more.

Documentation

See here for the documentation for our dev version: https://docs.neurodata.io/scikit-tree/dev/index.html

Why oblique trees and why trees beyond those in scikit-learn?

In 2001, Leo Breiman proposed two types of Random Forests. One was known as Forest-RI, which is the axis-aligned traditional random forest. One was known as Forest-RC, which is the random oblique linear combinations random forest. This leveraged random combinations of features to perform splits. MORF builds upon Forest-RC by proposing additional functions to combine features. Other modern tree variants such as Canonical Correlation Forests (CCF), Extended Isolation Forests, Quantile Forests, or unsupervised random forests are also important at solving real-world problems using robust decision tree models.

Installation

Our installation will try to follow scikit-learn installation as close as possible, as we contain Cython code subclassed, or inspired by the scikit-learn tree submodule.

Dependencies

We minimally require:

* Python (>=3.9)
* numpy
* scipy
* scikit-learn >= 1.3

Installation with Pip (https://pypi-hypernode.com/project/scikit-tree/)

Installing with pip on a conda environment is the recommended route.

pip install scikit-tree

Building locally with Meson (For developers)

Make sure you have the necessary packages installed

# install build dependencies
pip install numpy scipy meson ninja meson-python Cython scikit-learn scikit-learn-tree

# you may need these optional dependencies to build scikit-learn locally
conda install -c conda-forge joblib threadpoolctl pytest compilers llvm-openmp

We use the spin CLI to abstract away build details:

# run the build using Meson/Ninja
./spin build

# you can run the following command to see what other options there are
./spin --help
./spin build --help

# For example, you might want to start from a clean build
./spin build --clean

# or build in parallel for faster builds
./spin build -j 2

# you will need to double check the build-install has the proper path
# this might be different from machine to machine
export PYTHONPATH=${PWD}/build-install/usr/lib/python3.9/site-packages

# run specific unit tests
./spin test -- sktree/tree/tests/test_tree.py

# you can bring up the CLI menu
./spin --help

You can also do the same thing using Meson/Ninja itself. Run the following to build the local files:

# generate ninja make files
meson build --prefix=$PWD/build

# compile
ninja -C build

# install scikit-tree package
meson install -C build

export PYTHONPATH=${PWD}/build/lib/python3.9/site-packages

# to check installation, you need to be in a different directory
cd docs;  
python -c "from sktree import tree"
python -c "import sklearn; print(sklearn.__version__);"

After building locally, you can use editable installs (warning: this only registers Python changes locally)

pip install --no-build-isolation --editable .

Development

We welcome contributions for modern tree-based algorithms. We use Cython to achieve fast C/C++ speeds, while abiding by a scikit-learn compatible (tested) API. Moreover, our Cython internals are easily extensible because they follow the internal Cython API of scikit-learn as well.

Due to the current state of scikit-learn's internal Cython code for trees, we have to instead leverage a fork of scikit-learn at https://github.com/neurodata/scikit-learn when extending the decision tree model API of scikit-learn. Specifically, we extend the Python and Cython API of the tree submodule in scikit-learn in our submodule, so we can introduce the tree models housed in this package. Thus these extend the functionality of decision-tree based models in a way that is not possible yet in scikit-learn itself. As one example, we introduce an abstract API to allow users to implement their own oblique splits. Our plan in the future is to benchmark these functionalities and introduce them upstream to scikit-learn where applicable and inclusion criterion are met.

References

[1]: Li, Adam, et al. "Manifold Oblique Random Forests: Towards Closing the Gap on Convolutional Deep Networks" SIAM Journal on Mathematics of Data Science, 5(1), 77-96, 2023

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit_tree-0.2.1.tar.gz (14.3 MB view details)

Uploaded Source

Built Distributions

scikit_tree-0.2.1-cp311-cp311-win_amd64.whl (13.3 MB view details)

Uploaded CPython 3.11 Windows x86-64

scikit_tree-0.2.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

scikit_tree-0.2.1-cp311-cp311-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

scikit_tree-0.2.1-cp311-cp311-macosx_10_9_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.11 macOS 10.9+ x86-64

scikit_tree-0.2.1-cp310-cp310-win_amd64.whl (13.3 MB view details)

Uploaded CPython 3.10 Windows x86-64

scikit_tree-0.2.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

scikit_tree-0.2.1-cp310-cp310-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

scikit_tree-0.2.1-cp310-cp310-macosx_10_9_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

scikit_tree-0.2.1-cp39-cp39-win_amd64.whl (13.3 MB view details)

Uploaded CPython 3.9 Windows x86-64

scikit_tree-0.2.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

scikit_tree-0.2.1-cp39-cp39-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

scikit_tree-0.2.1-cp39-cp39-macosx_10_9_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

File details

Details for the file scikit_tree-0.2.1.tar.gz.

File metadata

  • Download URL: scikit_tree-0.2.1.tar.gz
  • Upload date:
  • Size: 14.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.15

File hashes

Hashes for scikit_tree-0.2.1.tar.gz
Algorithm Hash digest
SHA256 ef8f58284a7d0c961dac5b7d3c03eee150208ccd55c6db558a5649009f50eee1
MD5 b2f9e4b0b2e2e3b4e8f8480c8e330831
BLAKE2b-256 14e6e03007d9709ccd70db12416f8934e119a7cd714aa09f23e9bd94d8aa7a6a

See more details on using hashes here.

File details

Details for the file scikit_tree-0.2.1-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.2.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 cc5539de079119df0f65ffd0168e11d61f230dc31e9779e13c59bb8b44ba6f5a
MD5 5d001d89f705ea84c0cccdaf9be7fec5
BLAKE2b-256 cf7b7eb65354ce4e7d025a86ae8753667333e600aeed1b383056d891073f1788

See more details on using hashes here.

File details

Details for the file scikit_tree-0.2.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.2.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b956830aaa14120c48c4889054db4ca2b46561946e06a8f5520b388a3b416b24
MD5 c2ae42b3c2ab9e3ef05a99084b9adec3
BLAKE2b-256 18ea7c5820fd827e159134c738a87ccab5e9866ded1d3ac6d94421440f8fcc84

See more details on using hashes here.

File details

Details for the file scikit_tree-0.2.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.2.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f5e32d461ec0171dde0435c2cc6e0e2b4c2fef6139ca9fd23faa877476a56c95
MD5 d12fd703ca1c08aa272ba30a22b24bda
BLAKE2b-256 8553745604daa84c5e7943897a8f30e90f0fc3086db59c29a31ed26013a4be90

See more details on using hashes here.

File details

Details for the file scikit_tree-0.2.1-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.2.1-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 d7076cf2472c3e7ac5d53e61fd62c495f4e52a095c0155ab83698b8b5ff86dfb
MD5 094d049ca4b465ccc8033b9baab9f46d
BLAKE2b-256 3cdcd8e007fea4511917bacc8916c606fb329c0df47d68466160aba1182456dc

See more details on using hashes here.

File details

Details for the file scikit_tree-0.2.1-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.2.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 b192f7ae1e39b915811d8d6c3e51384e6bb622b4cc3fe24248c72e1ee4b1ce39
MD5 536db260acc5d14b421faa2d91fd70ed
BLAKE2b-256 d41b0169bb57b98693c013cc07612e126e934cd5f9442be6a1f1c0b2ec457651

See more details on using hashes here.

File details

Details for the file scikit_tree-0.2.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.2.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6bd8211437ca110aebdfcb5e86af5f74bf0d84922f4019bcad5a120c2b17d50c
MD5 e7aade46011209a393196a12c259bc84
BLAKE2b-256 92f0dfd8e6482c7f60cc39f9109c8132197de72c982a1d2aca0ab1f291315601

See more details on using hashes here.

File details

Details for the file scikit_tree-0.2.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.2.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 355248f411c868f825fca40ceb130220e02ce0e2b00888d8527bfc4661565575
MD5 e15ffa645107cab8cd9cbc1f46ba2b04
BLAKE2b-256 2327cefed9f6a10c1e4a0a8d0c14db6afb6cea970acc718e9d98b0de1ffcd904

See more details on using hashes here.

File details

Details for the file scikit_tree-0.2.1-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.2.1-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 2d820f0feaaf64eb2a695fa04a7e177a07d03eb0c9755138d8f04632fc7d2ffa
MD5 cc1da384587c98a2d66b7d47e7bc3c36
BLAKE2b-256 7e0d8d0258db8df89ef722c2bc703d7c7c8c410348b978694901f6f27ccc0073

See more details on using hashes here.

File details

Details for the file scikit_tree-0.2.1-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.2.1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 2b47104ee9a96155153b9c1d55e7e9fb7af198fe2c01f1060a304767be695830
MD5 3cfdf068ddad0cdffe759bd5287e693e
BLAKE2b-256 97244a1ff61e2da41c4e1e13b3b1a9f6eb9c324d988c3631ec4093ae6f463361

See more details on using hashes here.

File details

Details for the file scikit_tree-0.2.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.2.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9faa47643b16c884f9e9e7998cba2c8c6bb76330f4412dce287ca7656cde3bb9
MD5 566ab577b13c5593644c057758d16542
BLAKE2b-256 a291aa3ec6fb4e4c1e50cb92445d3a39f4a440eec9c724403261f2a76c81fb11

See more details on using hashes here.

File details

Details for the file scikit_tree-0.2.1-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.2.1-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b432008bc66578dacb105088e6fbb55127324296bc3aa6fd9b07a99a8f015d98
MD5 5b4f87c173a0c6dde68105e8b2b1f3a7
BLAKE2b-256 5db44a78edf29c0893b9e85dd0e0a1710d517f23b52191c6142a65075d3c0a0e

See more details on using hashes here.

File details

Details for the file scikit_tree-0.2.1-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.2.1-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 1506202d4ed25e88c2509706c8e4123cf6da1968464003d6bab7c878d3f8a1a9
MD5 a23f59d8a615ba05e7bd0cee8b4aa8d7
BLAKE2b-256 17bceaff1e57b5939361f79ce48e771d704e8a9acf6d745e0d05dbb6e7db02d1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page