Skip to main content

Elastic-net regularized generalized linear models.

Project description

A python implementation of elastic-net regularized generalized linear models

License Travis Codecov Circle Gitter DOI

[Documentation (stable version)] [Documentation (development version)]

Generalized linear models are well-established tools for regression and classification and are widely applied across the sciences, economics, business, and finance. They are uniquely identifiable due to their convex loss and easy to interpret due to their point-wise non-linearities and well-defined noise models.

In the era of exploratory data analyses with a large number of predictor variables, it is important to regularize. Regularization prevents overfitting by penalizing the negative log likelihood and can be used to articulate prior knowledge about the parameters in a structured form.

Despite the attractiveness of regularized GLMs, the available tools in the Python data science eco-system are highly fragmented. More specifically,

  • statsmodels provides a wide range of link functions but no regularization.

  • scikit-learn provides elastic net regularization but only for linear models.

  • lightning provides elastic net and group lasso regularization, but only for linear and logistic regression.

Pyglmnet is a response to this fragmentation. It runs on Python 3.5+, and here are some of the highlights.

  • Pyglmnet provides a wide range of noise models (and paired canonical link functions): 'gaussian', 'binomial', 'probit', 'gamma', ‘poisson’, and 'softplus'.

  • It supports a wide range of regularizers: ridge, lasso, elastic net, group lasso, and Tikhonov regularization.

  • Pyglmnet’s API is designed to be compatible with scikit-learn, so you can deploy Pipeline tools such as GridSearchCV() and cross_val_score().

  • We follow the same approach and notations as in Friedman, J., Hastie, T., & Tibshirani, R. (2010) and the accompanying widely popular R package.

  • We have implemented a cyclical coordinate descent optimizer with Newton update, active sets, update caching, and warm restarts. This optimization approach is identical to the one used in R package.

  • A number of Python wrappers exist for the R glmnet package (e.g. here and here) but in contrast to these, Pyglmnet is a pure python implementation. Therefore, it is easy to modify and introduce additional noise models and regularizers in the future.

Installation

Install the stable PyPI version with pip

$ pip install pyglmnet

For the bleeding edge development version:

Clone the repository.

$ pip install https://api.github.com/repos/glm-tools/pyglmnet/zipball/master

Getting Started

Here is an example on how to use the GLM estimator.

import numpy as np
import scipy.sparse as sps
from pyglmnet import GLM, simulate_glm

n_samples, n_features = 1000, 100
distr = 'poisson'

# sample a sparse model
beta0 = np.random.rand()
beta = np.random.random(n_features)
beta[beta < 0.9] = 0

# simulate data
Xtrain = np.random.normal(0.0, 1.0, [n_samples, n_features])
ytrain = simulate_glm('poisson', beta0, beta, Xtrain)
Xtest = np.random.normal(0.0, 1.0, [n_samples, n_features])
ytest = simulate_glm('poisson', beta0, beta, Xtest)

# create an instance of the GLM class
glm = GLM(distr='poisson', score_metric='deviance')

# fit the model on the training data
glm.fit(Xtrain, ytrain)

# predict using fitted model on the test data
yhat = glm.predict(Xtest)

# score the model on test data
deviance = glm.score(Xtest, ytest)

More pyglmnet examples and use cases.

Tutorial

Here is an extensive tutorial on GLMs, optimization and pseudo-code.

Here are slides from a talk at PyData Chicago 2016, corresponding tutorial notebooks and a video.

How to contribute?

We welcome pull requests. Please see our developer documentation page for more details.

Acknowledgments

License

MIT License Copyright (c) 2016-2019 Pavan Ramkumar

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyglmnet-1.1.tar.gz (162.6 kB view details)

Uploaded Source

Built Distribution

pyglmnet-1.1-py3-none-any.whl (23.2 kB view details)

Uploaded Python 3

File details

Details for the file pyglmnet-1.1.tar.gz.

File metadata

  • Download URL: pyglmnet-1.1.tar.gz
  • Upload date:
  • Size: 162.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.5.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for pyglmnet-1.1.tar.gz
Algorithm Hash digest
SHA256 6949eb9ea14ca43b6fbfbdb25b89fc3493f8b8f78705523b3a603dcd85d707a6
MD5 e7a90f472d7613162ad1deb372087b09
BLAKE2b-256 b3b4862550f7a6289752abd9c5ceb534259530c57a930371485c0704944ec1d4

See more details on using hashes here.

File details

Details for the file pyglmnet-1.1-py3-none-any.whl.

File metadata

  • Download URL: pyglmnet-1.1-py3-none-any.whl
  • Upload date:
  • Size: 23.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.5.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for pyglmnet-1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2c26db778d27dd6c5fe5d4fbffa877605945d36223a7f159b1d510f177b9da1d
MD5 ddaa6060d4c70aec7080c941ba4fbbcf
BLAKE2b-256 43d89244c82dbe764d20247845caeb8130a25affa1dc844e419a17291a8061ae

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page