Skip to main content

Code-generation for various ML models into native code.

Project description

m2cgen

Build Status Coverage Status License: MIT Python Versions PyPI Version Downloads

m2cgen (Model 2 Code Generator) - is a lightweight library which provides an easy way to transpile trained statistical models into a native code (Python, C, Java, Go, JavaScript, Visual Basic, C#, PowerShell, R, PHP).

Installation

Supported Python version is >= 3.5.

pip install m2cgen

Supported Languages

  • C
  • C#
  • Go
  • Java
  • JavaScript
  • PHP
  • PowerShell
  • Python
  • R
  • Visual Basic

Supported Models

Classification Regression
Linear
  • LogisticRegression
  • LogisticRegressionCV
  • PassiveAggressiveClassifier
  • Perceptron
  • RidgeClassifier
  • RidgeClassifierCV
  • SGDClassifier
  • ARDRegression
  • BayesianRidge
  • ElasticNet
  • ElasticNetCV
  • HuberRegressor
  • Lars
  • LarsCV
  • Lasso
  • LassoCV
  • LassoLars
  • LassoLarsCV
  • LassoLarsIC
  • LinearRegression
  • OrthogonalMatchingPursuit
  • OrthogonalMatchingPursuitCV
  • PassiveAggressiveRegressor
  • Ridge
  • RidgeCV
  • SGDRegressor
  • TheilSenRegressor
SVM
  • LinearSVC
  • NuSVC
  • SVC
  • LinearSVR
  • NuSVR
  • SVR
Tree
  • DecisionTreeClassifier
  • ExtraTreeClassifier
  • DecisionTreeRegressor
  • ExtraTreeRegressor
Random Forest
  • ExtraTreesClassifier
  • LGBMClassifier(rf booster only)
  • RandomForestClassifier
  • XGBRFClassifier(binary only, multiclass is not supported yet)
  • ExtraTreesRegressor
  • LGBMRegressor(rf booster only)
  • RandomForestRegressor
  • XGBRFRegressor
Boosting
  • LGBMClassifier(gbdt/dart/goss booster only)
  • XGBClassifier(gbtree/gblinear booster only)
    • LGBMRegressor(gbdt/dart/goss booster only)
    • XGBRegressor(gbtree/gblinear booster only)

    Classification Output

    Linear/Linear SVM

    Binary

    Scalar value; signed distance of the sample to the hyperplane for the second class.

    Multiclass

    Vector value; signed distance of the sample to the hyperplane per each class.

    Comment

    The output is consistent with the output of LinearClassifierMixin.decision_function.

    SVM

    Binary

    Scalar value; signed distance of the sample to the hyperplane for the second class.

    Multiclass

    Vector value; one-vs-one score for each class, shape (n_samples, n_classes * (n_classes-1) / 2).

    Comment

    The output is consistent with the output of BaseSVC.decision_function when the decision_function_shape is set to ovo.

    Tree/Random Forest/XGBoost/LightGBM

    Binary

    Vector value; class probabilities.

    Multiclass

    Vector value; class probabilities.

    Comment

    The output is consistent with the output of the predict_proba method of DecisionTreeClassifier/ForestClassifier/XGBClassifier/LGBMClassifier.

    Usage

    Here's a simple example of how a linear model trained in Python environment can be represented in Java code:

    from sklearn.datasets import load_boston
    from sklearn import linear_model
    import m2cgen as m2c
    
    boston = load_boston()
    X, y = boston.data, boston.target
    
    estimator = linear_model.LinearRegression()
    estimator.fit(X, y)
    
    code = m2c.export_to_java(estimator)
    

    Generated Java code:

    public class Model {
    
        public static double score(double[] input) {
            return (((((((((((((36.45948838508965) + ((input[0]) * (-0.10801135783679647))) + ((input[1]) * (0.04642045836688297))) + ((input[2]) * (0.020558626367073608))) + ((input[3]) * (2.6867338193449406))) + ((input[4]) * (-17.76661122830004))) + ((input[5]) * (3.8098652068092163))) + ((input[6]) * (0.0006922246403454562))) + ((input[7]) * (-1.475566845600257))) + ((input[8]) * (0.30604947898516943))) + ((input[9]) * (-0.012334593916574394))) + ((input[10]) * (-0.9527472317072884))) + ((input[11]) * (0.009311683273794044))) + ((input[12]) * (-0.5247583778554867));
        }
    }
    

    You can find more examples of generated code for different models/languages here.

    CLI

    m2cgen can be used as a CLI tool to generate code using serialized model objects (pickle protocol):

    $ m2cgen <pickle_file> --language <language> [--indent <indent>] [--class_name <class_name>]
             [--module_name <module_name>] [--package_name <package_name>] [--namespace <namespace>]
             [--recursion-limit <recursion_limit>]
    

    Don't forget that for unpickling serialized model objects their classes must be defined in the top level of an importable module in the unpickling environment.

    Piping is also supported:

    $ cat <pickle_file> | m2cgen --language <language>
    

    FAQ

    Q: Generation fails with RuntimeError: maximum recursion depth exceeded error.

    A: If this error occurs while generating code using an ensemble model, try to reduce the number of trained estimators within that model. Alternatively you can increase the maximum recursion depth with sys.setrecursionlimit(<new_depth>).

    Q: Generation fails with ImportError: No module named <module_name_here> error while transpiling model from a serialized model object.

    A: This error indicates that pickle protocol cannot deserialize model object. For unpickling serialized model objects, it is required that their classes must be defined in the top level of an importable module in the unpickling environment. So installation of package which provided model's class definition should solve the problem.

    Project details


    Download files

    Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

    Source Distribution

    m2cgen-0.6.0.tar.gz (26.8 kB view details)

    Uploaded Source

    Built Distribution

    m2cgen-0.6.0-py3-none-any.whl (47.2 kB view details)

    Uploaded Python 3

    File details

    Details for the file m2cgen-0.6.0.tar.gz.

    File metadata

    • Download URL: m2cgen-0.6.0.tar.gz
    • Upload date:
    • Size: 26.8 kB
    • Tags: Source
    • Uploaded using Trusted Publishing? No
    • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.4

    File hashes

    Hashes for m2cgen-0.6.0.tar.gz
    Algorithm Hash digest
    SHA256 43fdc396c04acc18b4713e52d161171441bbd59fbc30d9924bc158c05df97890
    MD5 7ad891a77e53bcbb64958b7f7becfb7b
    BLAKE2b-256 58f8496c6d5d0d754fd6c7680674414b60e4a2dffb47f2ab584fdfb273ce9df3

    See more details on using hashes here.

    File details

    Details for the file m2cgen-0.6.0-py3-none-any.whl.

    File metadata

    • Download URL: m2cgen-0.6.0-py3-none-any.whl
    • Upload date:
    • Size: 47.2 kB
    • Tags: Python 3
    • Uploaded using Trusted Publishing? No
    • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.4

    File hashes

    Hashes for m2cgen-0.6.0-py3-none-any.whl
    Algorithm Hash digest
    SHA256 6bb7a2892abe185d2fbc0c2c2fef11f4612dfa1224965b7c443db2aa26e9d584
    MD5 89be9985d35bce7a5745ae631e659030
    BLAKE2b-256 b6e6d784c7fe408467ab38692f5456dbd2926d366c3afcb9760eadec34ede881

    See more details on using hashes here.

    Supported by

    AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page