Skip to main content

Machine learning with dirty categories.

Project description

dirty_cat is a Python module for machine-learning on dirty categorical variables.

Website: https://dirty-cat.github.io/

For a detailed description of the problem of encoding dirty categorical data, see Similarity encoding for learning with dirty categorical variables [1] and Encoding high-cardinality string categorical variables [2].

Installation

Dependencies

dirty_cat requires:

  • Python (>= 3.5)

  • NumPy (>= 1.8.2)

  • SciPy (>= 1.0.1)

  • scikit-learn (>= 0.20.0)

Optional dependency:

  • python-Levenshtein for faster edit distances (not used for the n-gram distance)

User installation

If you already have a working installation of NumPy and SciPy, the easiest way to install dirty_cat is using pip

pip install -U --user dirty_cat

Other implementations

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dirty_cat-0.0.7.tar.gz (91.3 kB view details)

Uploaded Source

Built Distribution

dirty_cat-0.0.7-py3-none-any.whl (102.7 kB view details)

Uploaded Python 3

File details

Details for the file dirty_cat-0.0.7.tar.gz.

File metadata

  • Download URL: dirty_cat-0.0.7.tar.gz
  • Upload date:
  • Size: 91.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.2

File hashes

Hashes for dirty_cat-0.0.7.tar.gz
Algorithm Hash digest
SHA256 89b5424cbc3e0377c43ec474f41f9f4f517ff0f37ed0b7222d493025753060f2
MD5 5f33ab233115f4a2d33d1d0618e68e99
BLAKE2b-256 c0ddae3a7540cf39d8c3580c444538ca83147ceaf2c172e34c77378069e80fe5

See more details on using hashes here.

File details

Details for the file dirty_cat-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: dirty_cat-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 102.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.2

File hashes

Hashes for dirty_cat-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 0ffb73cadee491749dad4d90b1524a54f87ade154f1aa69aa8b419e7c07ca831
MD5 95b85ebb6127a57b8de097c06e2abf1c
BLAKE2b-256 8ff5a96f913b4e00bd65e069de06a786b6b73b4c5a4f46600f9548e853b1b47e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page