dirty-cat

Machine learning with dirty categories.

These details have been verified by PyPI

Maintainers

GaelVaroquaux JovanStojanovic pcerda Phaide

These details have not been verified by PyPI

Project links

Homepage

Project description

py_ver pypi_var pypi_dl codecov circleci

dirty_cat is a Python module for machine-learning on dirty categorical variables.

Website: https://dirty-cat.github.io/

dirty_cat’s SuperVectorizer automatically turns pandas data frames into numerical arrays suitable for learning.

For a detailed description of the problem of encoding dirty categorical data, see Similarity encoding for learning with dirty categorical variables [1] and Encoding high-cardinality string categorical variables [2].

Installation

Dependencies

dirty_cat requires:

Python (>= 3.6)
NumPy (>= 1.16)
SciPy (>= 1.2)
scikit-learn (>= 0.21.0)
pandas (>= 1.1.5)

Optional dependency:

python-Levenshtein for faster edit distances (not used for the n-gram distance)

User installation

If you already have a working installation of NumPy and SciPy, the easiest way to install dirty_cat is using pip

pip install -U --user dirty_cat

Other implementations

Spark ML: https://github.com/rakutentech/spark-dirty-cat

References

Project details

These details have been verified by PyPI

Maintainers

GaelVaroquaux JovanStojanovic pcerda Phaide

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.4.1

Apr 18, 2023

0.4.0

Feb 17, 2023

0.4.0b2 pre-release

Feb 1, 2023

0.4.0b1 pre-release

Jan 18, 2023

0.3.0

Sep 12, 2022

0.3.0b1 pre-release

Sep 9, 2022

This version

0.2.1

May 16, 2022

0.2.0

Oct 13, 2021

0.2.0a1 pre-release

Jul 20, 2021

0.1.0

Feb 17, 2021

0.0.7

Aug 2, 2020

0.0.5

Nov 19, 2018

0.0.4

Nov 19, 2018

0.0.3

Nov 19, 2018

0.0.2

Nov 6, 2018

0.0.1

Jun 8, 2018

0.0.1b3 pre-release

Mar 30, 2018

0.0.1b2 pre-release

Mar 27, 2018

0.0.1b1 pre-release

Mar 20, 2018

0.0.1a0 pre-release

Mar 19, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dirty_cat-0.2.1.tar.gz (54.1 kB view hashes)

Uploaded May 16, 2022 Source

Built Distribution

dirty_cat-0.2.1-py3-none-any.whl (63.6 kB view hashes)

Uploaded May 16, 2022 Python 3

Hashes for dirty_cat-0.2.1.tar.gz

Hashes for dirty_cat-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`5201fca4311c9acf17f73a80134c696795d7b86bf08e262e06b87a1692d32c5b`
MD5	`3357eb20838a50f5603a23e23cd46e50`
BLAKE2b-256	`67ce6adb6a2064a759785b0084bc4e04d063be9d522dcebfea67ebc6f90f3e9e`

Hashes for dirty_cat-0.2.1-py3-none-any.whl

Hashes for dirty_cat-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b8369ae3d0102d6012f8c56b0a0372fd994aca47baca49a9b6f52f42086b8199`
MD5	`4834fde796bd18bf89a5b15f1cf0fd48`
BLAKE2b-256	`453e2b9c299ef881da926b75cc5bc6d7469121d51fefdfbc53e9a55ac003d972`

dirty-cat 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Installation

Dependencies

User installation

Other implementations

References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution