Skip to main content

Machine learning with dirty categories.

Project description

dirty_cat is a Python module for machine-learning on dirty categorical variables.

Website: https://dirty-cat.github.io/

For a detailed description of the problem of encoding dirty categorical data, see Similarity encoding for learning with dirty categorical variables [1] and Encoding high-cardinality string categorical variables [2].

Installation

Dependencies

dirty_cat requires:

  • Python (>= 3.6)

  • NumPy (>= 1.8.2)

  • SciPy (>= 1.0.1)

  • scikit-learn (>= 0.20.0)

Optional dependency:

  • python-Levenshtein for faster edit distances (not used for the n-gram distance)

User installation

If you already have a working installation of NumPy and SciPy, the easiest way to install dirty_cat is using pip

pip install -U --user dirty_cat

Other implementations

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dirty_cat-0.1.0.tar.gz (98.1 kB view details)

Uploaded Source

Built Distribution

dirty_cat-0.1.0-py3-none-any.whl (112.0 kB view details)

Uploaded Python 3

File details

Details for the file dirty_cat-0.1.0.tar.gz.

File metadata

  • Download URL: dirty_cat-0.1.0.tar.gz
  • Upload date:
  • Size: 98.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for dirty_cat-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6333b55f6aae139704e355d41c9f1d9aae5086cc8c8ccc723fffa15947e475c1
MD5 4d0f8ab61b9c9c2e4ffb1041be5abcc5
BLAKE2b-256 292af35cfa82e12936ea9241b826410f22c35d640f2987b63317fad33de3bfb6

See more details on using hashes here.

File details

Details for the file dirty_cat-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: dirty_cat-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 112.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for dirty_cat-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a6d8f86de815a1ca6439ee9e320785ba955cdf97ba6e42b1e869c2966813a17c
MD5 4140777f716bb2e39297caa9a58cbe99
BLAKE2b-256 3026e90638d9f79825249bfd0f7cf1ac71c98eb99bc705a734c55c74e8cbeb6d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page