Skip to main content

An implementation of Wilkinson formulas.

Project description

Formulaic

PyPI - Version PyPI - Python Version PyPI - Status build codecov Code Style

Formulaic is a high-performance implementation of Wilkinson formulas for Python.

Note: This project, while largely complete, is still a work in progress, and the API is subject to change between major versions (0.<major>.<minor>).

It provides:

  • high-performance dataframe to model-matrix conversions.
  • support for reusing the encoding choices made during conversion of one data-set on other datasets.
  • extensible formula parsing.
  • extensible data input/output plugins, with implementations for:
    • input:
      • pandas.DataFrame
      • pyarrow.Table
    • output:
      • pandas.DataFrame
      • numpy.ndarray
      • scipy.sparse.CSCMatrix
  • support for symbolic differentiation of formulas (and hence model matrices).

Example code

import pandas
from formulaic import Formula

df = pandas.DataFrame({
    'y': [0,1,2],
    'x': ['A', 'B', 'C'],
    'z': [0.3, 0.1, 0.2],
})

y, X = Formula('y ~ x + z').get_model_matrix(df)

y =

y
0 0
1 1
2 2

X =

Intercept x[T.B] x[T.C] z
0 1.0 0 0 0.3
1 1.0 1 0 0.1
2 1.0 0 1 0.2

Benchmarks

Formulaic typically outperforms R for both dense and sparse model matrices, and vastly outperforms patsy (the existing implementation for Python) for dense matrices (patsy does not support sparse model matrix output).

Benchmarks

For more details, see here.

Related projects and prior art

  • Patsy: a prior implementation of Wilkinson formulas for Python, which is widely used (e.g. in statsmodels). It has fantastic documentation (which helped bootstrap this project), and a rich array of features.
  • StatsModels.jl @formula: The implementation of Wilkinson formulas for Julia.
  • R Formulas: The implementation of Wilkinson formulas for R, which is thoroughly introduced here. [R itself is an implementation of S, in which formulas were first made popular].
  • The work that started it all: Wilkinson, G. N., and C. E. Rogers. Symbolic description of factorial models for analysis of variance. J. Royal Statistics Society 22, pp. 392–399, 1973.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

formulaic-0.3.0.tar.gz (95.4 kB view details)

Uploaded Source

Built Distribution

formulaic-0.3.0-py3-none-any.whl (80.9 kB view details)

Uploaded Python 3

File details

Details for the file formulaic-0.3.0.tar.gz.

File metadata

  • Download URL: formulaic-0.3.0.tar.gz
  • Upload date:
  • Size: 95.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10

File hashes

Hashes for formulaic-0.3.0.tar.gz
Algorithm Hash digest
SHA256 cdc9e90dc8d9fd01c662ca30d603a6326e0b40f777b9ccdf118f52456db33352
MD5 12ade06e82eea2fa9e2ded9ef6758bb5
BLAKE2b-256 d6a0449dcdd25ec5f7b39fe9e3603f5313d31ff9e0318e294461cc01824e8bb7

See more details on using hashes here.

Provenance

File details

Details for the file formulaic-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: formulaic-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 80.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10

File hashes

Hashes for formulaic-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a84384b8bd84c4381fca6364fb5f6d99a401f1b7ea0cc64a84e3297e1a717f9a
MD5 c5e3b106c749e7e5589a8f58a802270d
BLAKE2b-256 9c24eee2b038a75af0fe47e37afc07da5a89552d60e840e396b79b8442e270bb

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page