Experimental, numba-based Gradient Boosting Machines

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Environment
- Console
Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3.6
- Python :: 3.7
Topic
- Scientific/Engineering

Project description

pygbm

Experimental Gradient Boosting Machines in Python.

The goal of this project is to evaluate whether it's possible to implement a pure Python yet efficient version histogram-binning of Gradient Boosting Trees (possibly with all the LightGBM optimizations) while staying in pure Python 3.6+ using the numba jit compiler.

pygbm provides a set of scikit-learn compatible estimator classes that should play well with the scikit-learn Pipeline and model selection tools (grid search and randomized hyperparameter search).

Longer term plans include integration with dask and dask-ml for out-of-core and distributed fitting on a cluster.

Installation

The project is available on PyPI and can be installed with pip:

pip install pygbm

Documentation

The API documentation is available at:

https://pygbm.readthedocs.io/

You might also want to have a look at the examples/ folder of this repo.

Status

The project is experimental. The API is subject to change without deprecation notice. Use at your own risk.

We welcome any feedback in the github issue tracker:

https://github.com/ogrisel/pygbm/issues

Running the development version

Use pip to install in "editable" mode:

git clone https://github.com/ogrisel/pygbm.git
cd pygbm
pip install -r requirements.txt
pip install --editable .

Run the tests with pytest:

pip install -r requirements.txt
pytest

Benchmarking

The benchmarks folder contains some scripts to evaluate the computation performance of various parts of pygbm. Keep in mind that numba's JIT compilation takes time!

Profiling

To profile the benchmarks, you can use snakeviz to get an interactive HTML report:

pip install snakeviz
python -m cProfile -o bench_higgs_boson.prof benchmarks/bench_higgs_boson.py
snakeviz bench_higgs_boson.prof

Debugging numba type inference

To introspect the results of type inference steps in the numba sections called by a given benchmarking script:

numba --annotate-html bench_higgs_boson.html benchmarks/bench_higgs_boson.py

In particular it is interesting to check that the numerical variables in the hot loops highlighted by the snakeviz profiling report have the expected precision level (e.g. float32 for loss computation, uint8 for binned feature values, ...).

Impact of thread-based parallelism

Some benchmarks can call numba functions that leverage the built-in thread-based parallelism with @njit(parallel=True) and prange loops. On a multicore machine you can evaluate how the thread-based parallelism scales by explicitly setting the NUMBA_NUM_THREAD environment variable. For instance try:

NUMBA_NUM_THREADS=1 python benchmarks/bench_binning.py

vs:

NUMBA_NUM_THREADS=4 python benchmarks/bench_binning.py

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Environment
- Console
Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3.6
- Python :: 3.7
Topic
- Scientific/Engineering

Release history Release notifications | RSS feed

0.1.0

Dec 14, 2018

This version

0.1.0.dev0 pre-release

Dec 14, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

pygbm-0.1.0.dev0-py3-none-any.whl (32.5 kB view details)

Uploaded Dec 14, 2018 Python 3

File details

Details for the file pygbm-0.1.0.dev0-py3-none-any.whl.

File metadata

Download URL: pygbm-0.1.0.dev0-py3-none-any.whl
Upload date: Dec 14, 2018
Size: 32.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.1

File hashes

Hashes for pygbm-0.1.0.dev0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1821a7a37c1a0734652ada697a52e003fd261d0adc230c11c102808e4babcce8`
MD5	`c667eea6fa7ef78317153cd913fd612d`
BLAKE2b-256	`bdcf5324cc2fc954a95d232ec1eba7f20a2119337eb77d9a0fb01c465567caef`