dirty-cat

Machine learning with dirty categories.

These details have been verified by PyPI

Maintainers

GaelVaroquaux JovanStojanovic pcerda Phaide

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
Topic
- Scientific/Engineering
- Software Development :: Libraries

Project description

dirty_cat is a Python module for machine-learning on dirty categorical variables.

Website: https://dirty-cat.github.io/

For a detailed description of the problem of encoding dirty categorical data, see Similarity encoding for learning with dirty categorical variables [1] and Encoding high-cardinality string categorical variables [2].

Installation

Dependencies

dirty_cat requires:

Python (>= 3.6)
NumPy (>= 1.16)
SciPy (>= 1.2)
scikit-learn (>= 0.20.0)

Optional dependency:

python-Levenshtein for faster edit distances (not used for the n-gram distance)

User installation

If you already have a working installation of NumPy and SciPy, the easiest way to install dirty_cat is using pip

pip install -U --user dirty_cat

Other implementations

Spark ML: https://github.com/rakutentech/spark-dirty-cat

References

Project details

These details have been verified by PyPI

Maintainers

GaelVaroquaux JovanStojanovic pcerda Phaide

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
Topic
- Scientific/Engineering
- Software Development :: Libraries

Release history Release notifications | RSS feed

0.4.1

Apr 18, 2023

0.4.0

Feb 17, 2023

0.4.0b2 pre-release

Feb 1, 2023

0.4.0b1 pre-release

Jan 18, 2023

0.3.0

Sep 12, 2022

0.3.0b1 pre-release

Sep 9, 2022

0.2.1

May 16, 2022

0.2.0

Oct 13, 2021

This version

0.2.0a1 pre-release

Jul 20, 2021

0.1.0

Feb 17, 2021

0.0.7

Aug 2, 2020

0.0.5

Nov 19, 2018

0.0.4

Nov 19, 2018

0.0.3

Nov 19, 2018

0.0.2

Nov 6, 2018

0.0.1

Jun 8, 2018

0.0.1b3 pre-release

Mar 30, 2018

0.0.1b2 pre-release

Mar 27, 2018

0.0.1b1 pre-release

Mar 20, 2018

0.0.1a0 pre-release

Mar 19, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dirty_cat-0.2.0a1.tar.gz (106.6 kB view hashes)

Uploaded Jul 20, 2021 Source

Built Distribution

dirty_cat-0.2.0a1-py3-none-any.whl (122.7 kB view hashes)

Uploaded Jul 20, 2021 Python 3

Hashes for dirty_cat-0.2.0a1.tar.gz

Hashes for dirty_cat-0.2.0a1.tar.gz
Algorithm	Hash digest
SHA256	`17ad5d9c9c539158490a3a05347d433e91c8e1e668f728bbbacf128672e856cd`
MD5	`f9f4136ac4a66703e528ba3cd4ad5e6e`
BLAKE2b-256	`54099e54069da207b9be9108a5976ca11451e77b0169a4f815f5dcd9a2371cd1`

Hashes for dirty_cat-0.2.0a1-py3-none-any.whl

Hashes for dirty_cat-0.2.0a1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a6a9835f1b3764ed67a0874b6cab4ec26de5ead38416729378ab4b8f5c66e0a9`
MD5	`e30588f0f0a50e03f51c44bacfc5da73`
BLAKE2b-256	`c120a9ac6e033e3e8ce3053d70213eb797b474cde9832fe326cd664d64f34678`

dirty-cat 0.2.0a1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Installation

Dependencies

User installation

Other implementations

References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution