rgf-python

Scikit-learn Wrapper for Regularized Greedy Forest

These details have not been verified by PyPI

Project links

Homepage

Project description

rgf_python

The wrapper of machine learning algorithm *Regularized Greedy Forest (RGF)* for Python.

Features

Scikit-learn interface and possibility of usage for multi-label classification problem.

Original RGF implementation is available only for regression and binary classification, but rgf_python is also available for multi-label classification by “One-vs-Rest” method.

Example:

from sklearn import datasets
from sklearn.utils.validation import check_random_state
from sklearn.model_selection import StratifiedKFold, cross_val_score
from rgf.sklearn import RGFClassifier

iris = datasets.load_iris()
rng = check_random_state(0)
perm = rng.permutation(iris.target.size)
iris.data = iris.data[perm]
iris.target = iris.target[perm]

rgf = RGFClassifier(max_leaf=400,
                    algorithm="RGF_Sib",
                    test_interval=100,
                    verbose=True)

n_folds = 3

rgf_scores = cross_val_score(rgf,
                             iris.data,
                             iris.target,
                             cv=StratifiedKFold(n_folds))

rgf_score = sum(rgf_scores)/n_folds
print('RGF Classfier score: {0:.5f}'.format(rgf_score))

More examples could be found here.

Software Requirements

Python (2.7 or >= 3.4)
scikit-learn (>= 0.18)
RGF C++ (link)

If you can’t access the above URL, alternatively, you can get RGF C++ by downloading it from this page. Please see README in the zip file to build RGF executional.

Installation

git clone https://github.com/fukatani/rgf_python.git
python setup.py install

or using pip:

pip install git+git://github.com/fukatani/rgf_python@master

You have to place RGF execution file in directory which is included in environmental variable ‘PATH’. Or you can direct specify path by manual editing rgf/sklearn.py

## Edit this ##################################################
#Location of the RGF executable
loc_exec = 'C:\\Program Files\\RGF\\bin\\rgf.exe'
#Location for RGF temp files
loc_temp = 'temp/'
## End Edit ##################################################

You need to set actual location of RGF execution file by editing ‘loc_exec’. And the variable ‘loc_temp’ can be changed to specify the directory for placing temp files.

Tuning Hyper-parameters

You can tune hyper-parameters as follows.

max_leaf: Appropriate values are data-dependent and usually varied from 1000 to 10000.
test_interval: For efficiency, it must be either multiple or divisor of 100 (default value of the optimization interval).
algorithm: You can select “RGF”, “RGF Opt” or “RGF Sib”.
loss: You can select “LS”, “Log” or “Expo”.
reg_depth: Must be no smaller than 1. Meant for being used with algorithm = “RGF Opt” or “RGF Sib”.
l2: Either 1, 0.1, or 0.01 often produces good results though with exponential loss (loss = “Expo”) and logistic loss (loss = “Log”), some data requires smaller values such as 1e-10 or 1e-20.
sl2: Default value is equal to l2. On some data, l2/100 works well.
normalize: If turned on, training targets are normalized so that the average becomes zero.
min_samples_leaf: Smaller values may slow down training. Too large values may degrade model accuracy.
n_iter: Number of iterations of coordinate descent to optimize weights.
n_tree_search: Number of trees to be searched for the nodes to split. The most recently grown trees are searched first.
opt_interval: Weight optimization interval in terms of the number of leaf nodes.
learning_rate: Step size of Newton updates used in coordinate descent to optimize weights.

Detailed instruction of tuning hyper-parameters is here.

Using at Kaggle Kernel

Now, Kaggle Kernel supports rgf_python. Please see this page.

Other

Shamelessly, much part of the implementation is based on the following code. Thanks!

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

3.12.0

Jan 7, 2022

3.11.0

Aug 22, 2021

3.10.0

Apr 28, 2021

3.9.0

Aug 10, 2020

3.8.0

Apr 1, 2020

3.7.0

Feb 5, 2020

3.6.0

Jun 9, 2019

3.5.0

Jan 15, 2019

3.4.0

Dec 4, 2018

3.3.0

Jul 21, 2018

3.2.0

Jun 16, 2018

3.1.0

Feb 24, 2018

3.0.0

Feb 10, 2018

2.3.0

Jan 8, 2018

2.2.0

Dec 29, 2017

2.1.2

Dec 4, 2017

2.1.0

Nov 10, 2017

2.0.3

Oct 3, 2017

2.0.2

Aug 21, 2017

2.0.1

Aug 20, 2017

2.0.0

Aug 19, 2017

1.3.1

Jul 3, 2017

1.3.0

Jun 29, 2017

1.2.5

Jun 29, 2017

1.2.3

Jun 29, 2017

1.2.2

Jun 29, 2017

This version

1.2.1

Jun 29, 2017

1.2.0

Jun 29, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rgf_python-1.2.1.tar.gz (9.7 kB view hashes)

Uploaded Jun 29, 2017 Source

Hashes for rgf_python-1.2.1.tar.gz

Hashes for rgf_python-1.2.1.tar.gz
Algorithm	Hash digest
SHA256	`e168687745424520c73cc614557a44f0bb09678374b15f1448a4d0f77280a918`
MD5	`36fdf1d3c3efa1cc2a202d3db63537ff`
BLAKE2b-256	`a882620c00f985bf37e8a500f8d55d9dbae5fbd0a457e1032d3bb9678b0681a5`