Skip to main content

Statistical computations and models for use with SciPy

Project description

What it is

Statsmodels is a Python package that provides a complement to scipy for statistical computations including descriptive statistics and estimation and inference for statistical models.

Main Features

  • linear regression models: Generalized least squares (including weighted least squares and least squares with autoregressive errors), ordinary least squares.

  • glm: Generalized linear models with support for all of the one-parameter exponential family distributions.

  • discrete: regression with discrete dependent variables, including Logit, Probit, MNLogit, Poisson, based on maximum likelihood estimators

  • rlm: Robust linear models with support for several M-estimators.

  • tsa: models for time series analysis - univariate time series analysis: AR, ARIMA - vector autoregressive models, VAR and structural VAR - descriptive statistics and process models for time series analysis

  • nonparametric : (Univariate) kernel density estimators

  • datasets: Datasets to be distributed and used for examples and in testing.

  • stats: a wide range of statistical tests - diagnostics and specification tests - goodness-of-fit and normality tests - functions for multiple testing - various additional statistical tests

  • iolib - Tools for reading Stata .dta files into numpy arrays. - printing table output to ascii, latex, and html

  • miscellaneous models

  • sandbox: statsmodels contains a sandbox folder with code in various stages of developement and testing which is not considered “production ready”. This covers among others Mixed (repeated measures) Models, GARCH models, general method of moments (GMM) estimators, kernel regression, various extensions to scipy.stats.distributions, panel data models, generalized additive models and information theoretic measures.

Where to get it

The master branch on GitHub is the most up to date code

https://www.github.com/statsmodels/statsmodels

Source download of release tags are available on GitHub

https://github.com/statsmodels/statsmodels/tags

Binaries and source distributions are available from PyPi

http://pypi.python.org/pypi/statsmodels/

Installation from sources

See INSTALL.txt for requirements or see the documentation

http://statsmodels.sf.net/devel/install.html

License

Modified BSD (3-clause)

Documentation

The official documentation is hosted on SourceForge

http://statsmodels.sf.net/

Windows Help

We are providing a Windows htmlhelp file (statsmodels.chm) that is now separately distributed, available at http://sourceforge.net/projects/statsmodels/files/statsmodels-0.4.3/statsmodelsdoc.zip/download

It can be copied or moved to the installation directory of statsmodels (site-packagesstatsmodels in a typical installation), and can then be opened from the python interpreter

>>> import statsmodels.api as sm
>>> sm.open_help()

Discussion and Development

Discussions take place on our mailing list.

http://groups.google.com/group/pystatsmodels

We are very interested in feedback about usability and suggestions for improvements.

Bug Reports

Bug reports can be submitted to the issue tracker at

https://github.com/statsmodels/statsmodels/issues

Release History

0.4.3

The only change compared to 0.4.2 is for compatibility with python 3.2.3 (changed behavior of 2to3).

0.4.2

This is a bug-fix release that affects mainly Big-Endian machines.

Bug Fixes

  • discrete_model.MNLogit: fix summary method

  • examples in documentation: correct file path

  • tsa.filters.hp_filter: don’t use umfpack on Big-Endian machine (scipy bug)

  • the remaining fixes are in the test suite, either precision problems on some machines or incorrect testing on Big-Endian machines.

0.4.1

This is a backwards compatible (according to our test suite) release with bug fixes and code cleanup.

Bug Fixes

  • build and distribution fixes

  • lowess correct distance calculation

  • genmod correction CDFlink derivative

  • adfuller _autolag correct calculation of optimal lag

  • het_arch, het_lm : fix autolag and store options

  • GLSAR: incorrect whitening for lag>1

Other Changes

  • add lowess and other functions to api and documentation

  • rename lowess module (old import path will be removed at next release)

  • new robust sandwich covariance estimators, moved out of sandbox

  • compatibility with pandas 0.8

  • new plots in statsmodels.graphics - ABLine plot - interaction plot

0.4.0

Main Changes and Additions

  • Added pandas dependency.

  • Cython source is built automatically if cython and compiler are present

  • Support use of dates in timeseries models

  • Improved plots - Violin plots - Bean Plots - QQ Plots

  • Added lowess function

  • Support for pandas Series and DataFrame objects. Results instances return pandas objects if the models are fit using pandas objects.

  • Full Python 3 compatibility

  • Fix bugs in genfromdta. Convert Stata .dta format to structured array preserving all types. Conversion is much faster now.

  • Improved documentation

  • Models and results are pickleable via save/load, optionally saving the model data.

  • Kernel Density Estimation now uses Cython and is considerably faster.

  • Diagnostics for outlier and influence statistics in OLS

  • Added El Nino Sea Surface Temperatures dataset

  • Numerous bug fixes

  • Internal code refactoring

  • Improved documentation including examples as part of HTML

Changes that break backwards compatibility

  • Deprecated scikits namespace. The recommended import is now:

    import statsmodels.api as sm
  • model.predict methods signature is now (params, exog, …) where before it assumed that the model had been fit and omitted the params argument.

  • For consistency with other multi-equation models, the parameters of MNLogit are now transposed.

  • tools.tools.ECDF -> distributions.ECDF

  • tools.tools.monotone_fn_inverter -> distributions.monotone_fn_inverter

  • tools.tools.StepFunction -> distributions.StepFunction

0.3.1

  • Removed academic-only WFS dataset.

  • Fix easy_install issue on Windows.

0.3.0

Changes that break backwards compatibility

Added api.py for importing. So the new convention for importing is:

import statsmodels.api as sm

Importing from modules directly now avoids unnecessary imports and increases the import speed if a library or user only needs specific functions.

  • sandbox/output.py -> iolib/table.py

  • lib/io.py -> iolib/foreign.py (Now contains Stata .dta format reader)

  • family -> families

  • families.links.inverse -> families.links.inverse_power

  • Datasets’ Load class is now load function.

  • regression.py -> regression/linear_model.py

  • discretemod.py -> discrete/discrete_model.py

  • rlm.py -> robust/robust_linear_model.py

  • glm.py -> genmod/generalized_linear_model.py

  • model.py -> base/model.py

  • t() method -> tvalues attribute (t() still exists but raises a warning)

Main changes and additions

  • Numerous bugfixes.

  • Time Series Analysis model (tsa)

    • Vector Autoregression Models VAR (tsa.VAR)

    • Autogressive Models AR (tsa.AR)

    • Autoregressive Moving Average Models ARMA (tsa.ARMA) optionally uses Cython for Kalman Filtering use setup.py install with option –with-cython

    • Baxter-King band-pass filter (tsa.filters.bkfilter)

    • Hodrick-Prescott filter (tsa.filters.hpfilter)

    • Christiano-Fitzgerald filter (tsa.filters.cffilter)

  • Improved maximum likelihood framework uses all available scipy.optimize solvers

  • Refactor of the datasets sub-package.

  • Added more datasets for examples.

  • Removed RPy dependency for running the test suite.

  • Refactored the test suite.

  • Refactored codebase/directory structure.

  • Support for offset and exposure in GLM.

  • Removed data_weights argument to GLM.fit for Binomial models.

  • New statistical tests, especially diagnostic and specification tests

  • Multiple test correction

  • General Method of Moment framework in sandbox

  • Improved documentation

  • and other additions

0.2.0

Main changes

  • renames for more consistency RLM.fitted_values -> RLM.fittedvalues GLMResults.resid_dev -> GLMResults.resid_deviance

  • GLMResults, RegressionResults: lazy calculations, convert attributes to properties with _cache

  • fix tests to run without rpy

  • expanded examples in examples directory

  • add PyDTA to lib.io – functions for reading Stata .dta binary files and converting them to numpy arrays

  • made tools.categorical much more robust

  • add_constant now takes a prepend argument

  • fix GLS to work with only a one column design

New

  • add four new datasets

    • A dataset from the American National Election Studies (1996)

    • Grunfeld (1950) investment data

    • Spector and Mazzeo (1980) program effectiveness data

    • A US macroeconomic dataset

  • add four new Maximum Likelihood Estimators for models with a discrete dependent variables with examples

    • Logit

    • Probit

    • MNLogit (multinomial logit)

    • Poisson

Sandbox

  • add qqplot in sandbox.graphics

  • add sandbox.tsa (time series analysis) and sandbox.regression (anova)

  • add principal component analysis in sandbox.tools

  • add Seemingly Unrelated Regression (SUR) and Two-Stage Least Squares for systems of equations in sandbox.sysreg.Sem2SLS

  • add restricted least squares (RLS)

0.1.0b1

  • initial release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

statsmodels-0.4.3.zip (4.4 MB view details)

Uploaded Source

statsmodels-0.4.3.tar.gz (4.2 MB view details)

Uploaded Source

Built Distributions

statsmodels-0.4.3.win-amd64-py3.2.exe (3.5 MB view details)

Uploaded Source

statsmodels-0.4.3.win-amd64-py2.7.exe (3.5 MB view details)

Uploaded Source

statsmodels-0.4.3.win-amd64-py2.6.exe (3.5 MB view details)

Uploaded Source

statsmodels-0.4.3.win32-py3.2.exe (3.5 MB view details)

Uploaded Source

statsmodels-0.4.3.win32-py2.7.exe (3.5 MB view details)

Uploaded Source

statsmodels-0.4.3.win32-py2.6.exe (3.5 MB view details)

Uploaded Source

File details

Details for the file statsmodels-0.4.3.zip.

File metadata

  • Download URL: statsmodels-0.4.3.zip
  • Upload date:
  • Size: 4.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for statsmodels-0.4.3.zip
Algorithm Hash digest
SHA256 c5de6e55d0341269f4b0f385e70d613822dba64d71fc9610c8ed7102bdc5afb8
MD5 97f7e4c1b9870d6f783359f6d1774437
BLAKE2b-256 cb3d2c95d055582795d178aa3b88259d99a3c1b2d5ef1c2b707c8445e1be5b21

See more details on using hashes here.

File details

Details for the file statsmodels-0.4.3.tar.gz.

File metadata

File hashes

Hashes for statsmodels-0.4.3.tar.gz
Algorithm Hash digest
SHA256 504a4f6ccb657c1fab21c6cea5bc53a698bebf72f226dbf0f13374f7f371a7d4
MD5 eee727c2fa4e3d884f1baaae7ae3d58c
BLAKE2b-256 c6f2fdbcb500d078165757496e590f395ef610772c98869566d767554b1deb08

See more details on using hashes here.

File details

Details for the file statsmodels-0.4.3.win-amd64-py3.2.exe.

File metadata

File hashes

Hashes for statsmodels-0.4.3.win-amd64-py3.2.exe
Algorithm Hash digest
SHA256 304cf3e07c248bd8fc189a69c7ad885b2fdeb364e80c26be6a99b17fc5c1ac0e
MD5 a4498816016e714e76a5f3cf9d780ea7
BLAKE2b-256 509c75a7892ce1a96477a2170ebfe5e5f295633c54a1087ec266be44902bd634

See more details on using hashes here.

File details

Details for the file statsmodels-0.4.3.win-amd64-py2.7.exe.

File metadata

File hashes

Hashes for statsmodels-0.4.3.win-amd64-py2.7.exe
Algorithm Hash digest
SHA256 ae1df587344587edd6dde31cab1c4d2cfeed1a01706e712284e97c5c12e0f925
MD5 5b6acf2309868aabbc33712f97a2fa74
BLAKE2b-256 34ed228b443d9456c7085c4eee71afc304417b60d80b4b60735d9dd5d3de70ca

See more details on using hashes here.

File details

Details for the file statsmodels-0.4.3.win-amd64-py2.6.exe.

File metadata

File hashes

Hashes for statsmodels-0.4.3.win-amd64-py2.6.exe
Algorithm Hash digest
SHA256 bf7d68b9adebc437531e52541f169f843d9899504eb47b30af4d615745324123
MD5 1c88ce6c7446ce25176b5876d373178c
BLAKE2b-256 180eb34dbd571f908b2fa1041845e1f16bf61771a5b4a7f65e720b3e4052ecd4

See more details on using hashes here.

File details

Details for the file statsmodels-0.4.3.win32-py3.2.exe.

File metadata

File hashes

Hashes for statsmodels-0.4.3.win32-py3.2.exe
Algorithm Hash digest
SHA256 f86e06edd36b7039aa28358dc1982a1fe3699d844a2777389ce834e93cdf068e
MD5 6f9b2d02df506269cffdeb1b22d7a62f
BLAKE2b-256 6f51e48abad852d062fa76fdf594fe9e41a95fe168d0243e6f5b152eadc19063

See more details on using hashes here.

File details

Details for the file statsmodels-0.4.3.win32-py2.7.exe.

File metadata

File hashes

Hashes for statsmodels-0.4.3.win32-py2.7.exe
Algorithm Hash digest
SHA256 e3010eece53871a9a9ce32b276dc6f8a114b1cff2d22aa8f49d6aa2518181146
MD5 b4ba37e77ca6106b2b48286cbf4221e2
BLAKE2b-256 0fce9c23107623309bc7d13a8c45f1dd09074ae21111f03bf5190f8c6551a33f

See more details on using hashes here.

File details

Details for the file statsmodels-0.4.3.win32-py2.6.exe.

File metadata

File hashes

Hashes for statsmodels-0.4.3.win32-py2.6.exe
Algorithm Hash digest
SHA256 573a3ab5cd3e80d07521b6ba7dcc85ca92f0cc3506da8a923804f0d946abed22
MD5 de33b0d7fc2d3c1072680fe0672786f2
BLAKE2b-256 f13de2fbe80b5170941cb4399ec4a882397e7ef77cfcc5c58a82173a296eca34

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page