Skip to main content

Machine learning with dirty categories.

Project description

dirty_cat is a Python module for machine-learning on dirty categorical variables.

Website: https://dirty-cat.github.io/

For a detailed description of the problem of encoding dirty categorical data, see Similarity encoding for learning with dirty categorical variables [1].

Installation

Dependencies

dirty_cat requires:

  • Python (>= 3.5)

  • NumPy (>= 1.8.2)

  • SciPy (>= 1.0.1)

  • scikit-learn (>= 0.20.0)

Optional dependency:

  • python-Levenshtein for faster edit distances (not used for the n-gram distance)

User installation

If you already have a working installation of NumPy and SciPy, the easiest way to install dirty_cat is using pip

pip install -U --user dirty_cat

Other implementations

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dirty_cat-0.0.5.tar.gz (80.0 kB view details)

Uploaded Source

Built Distribution

dirty_cat-0.0.5-py2.py3-none-any.whl (91.2 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file dirty_cat-0.0.5.tar.gz.

File metadata

  • Download URL: dirty_cat-0.0.5.tar.gz
  • Upload date:
  • Size: 80.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Python-urllib/3.6

File hashes

Hashes for dirty_cat-0.0.5.tar.gz
Algorithm Hash digest
SHA256 30a7051e2485d4396a24c129660f292721b78c58e1942bfb42f721a5ac38930e
MD5 2813fb0761d8b9ec4c970f6e2d85b394
BLAKE2b-256 c6b46b32a7efa37aa6463f5b43629dfe3ffd59fdf2f49ba7f9bd352722deb06d

See more details on using hashes here.

File details

Details for the file dirty_cat-0.0.5-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for dirty_cat-0.0.5-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 b1d78d173843364e8922f57fa39a1b7d813f083e894c87d4674e742ae8ef252e
MD5 1f2d589cc345715fd69b82d91c4e631e
BLAKE2b-256 8d847de88b45593b71fe8552c3038232502337eb3c0bd4b296361849a20fdabc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page