Skip to main content

An implementation of Wilkinson formulas.

Project description

Formulaic

PyPI - Version PyPI - Python Version PyPI - Status build docs codecov Code Style

Formulaic is a high-performance implementation of Wilkinson formulas for Python.

It provides:

  • high-performance dataframe to model-matrix conversions.
  • support for reusing the encoding choices made during conversion of one data-set on other datasets.
  • extensible formula parsing.
  • extensible data input/output plugins, with implementations for:
    • input:
      • pandas.DataFrame
      • pyarrow.Table
    • output:
      • pandas.DataFrame
      • numpy.ndarray
      • scipy.sparse.CSCMatrix
  • support for symbolic differentiation of formulas (and hence model matrices).
  • and much more.

Example code

import pandas
from formulaic import Formula

df = pandas.DataFrame({
    'y': [0,1,2],
    'x': ['A', 'B', 'C'],
    'z': [0.3, 0.1, 0.2],
})

y, X = Formula('y ~ x + z').get_model_matrix(df)

y =

y
0 0
1 1
2 2

X =

Intercept x[T.B] x[T.C] z
0 1.0 0 0 0.3
1 1.0 1 0 0.1
2 1.0 0 1 0.2

Note that the above can be short-handed to:

from formulaic import model_matrix
model_matrix('y ~ x + z', df)

Benchmarks

Formulaic typically outperforms R for both dense and sparse model matrices, and vastly outperforms patsy (the existing implementation for Python) for dense matrices (patsy does not support sparse model matrix output).

Benchmarks

For more details, see here.

Related projects and prior art

  • Patsy: a prior implementation of Wilkinson formulas for Python, which is widely used (e.g. in statsmodels). It has fantastic documentation (which helped bootstrap this project), and a rich array of features.
  • StatsModels.jl @formula: The implementation of Wilkinson formulas for Julia.
  • R Formulas: The implementation of Wilkinson formulas for R, which is thoroughly introduced here. [R itself is an implementation of S, in which formulas were first made popular].
  • The work that started it all: Wilkinson, G. N., and C. E. Rogers. Symbolic description of factorial models for analysis of variance. J. Royal Statistics Society 22, pp. 392–399, 1973.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

formulaic-1.0.0.tar.gz (430.1 kB view details)

Uploaded Source

Built Distribution

formulaic-1.0.0-py3-none-any.whl (94.2 kB view details)

Uploaded Python 3

File details

Details for the file formulaic-1.0.0.tar.gz.

File metadata

  • Download URL: formulaic-1.0.0.tar.gz
  • Upload date:
  • Size: 430.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.26.0

File hashes

Hashes for formulaic-1.0.0.tar.gz
Algorithm Hash digest
SHA256 3811376c73cc133f9137a624c4b17b8e288321259102cd0e9d05b60553c74346
MD5 22a85dfbf5eaafec9aea16a6bc18a901
BLAKE2b-256 56b536235ab4c2270eb46d48e020aaaa39998f3d861e13c1f1b8e9568e5e1fdb

See more details on using hashes here.

Provenance

File details

Details for the file formulaic-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: formulaic-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 94.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.26.0

File hashes

Hashes for formulaic-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2a183a50d95ddb0a29fd16e30f7a35ded267b07596723a32c3b16149439ed38e
MD5 e660efba4d2cefa8a85a243529eaac61
BLAKE2b-256 6a3249c4cd28f49fe32775059af12bf685776a44686abeda8588231f39e5842b

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page