Machine learning with dirty categories.
Project description
dirty_cat is a Python module for machine-learning on dirty categorical variables.
Website: https://dirty-cat.github.io/
For a detailed description of the problem of encoding dirty categorical data, see Similarity encoding for learning with dirty categorical variables [1] and Encoding high-cardinality string categorical variables [2].
Installation
Dependencies
dirty_cat requires:
Python (>= 3.6)
NumPy (>= 1.8.2)
SciPy (>= 1.0.1)
scikit-learn (>= 0.20.0)
Optional dependency:
python-Levenshtein for faster edit distances (not used for the n-gram distance)
User installation
If you already have a working installation of NumPy and SciPy, the easiest way to install dirty_cat is using pip
pip install -U --user dirty_cat
Other implementations
References
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dirty_cat-0.1.0.tar.gz
.
File metadata
- Download URL: dirty_cat-0.1.0.tar.gz
- Upload date:
- Size: 98.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6333b55f6aae139704e355d41c9f1d9aae5086cc8c8ccc723fffa15947e475c1 |
|
MD5 | 4d0f8ab61b9c9c2e4ffb1041be5abcc5 |
|
BLAKE2b-256 | 292af35cfa82e12936ea9241b826410f22c35d640f2987b63317fad33de3bfb6 |
File details
Details for the file dirty_cat-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: dirty_cat-0.1.0-py3-none-any.whl
- Upload date:
- Size: 112.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a6d8f86de815a1ca6439ee9e320785ba955cdf97ba6e42b1e869c2966813a17c |
|
MD5 | 4140777f716bb2e39297caa9a58cbe99 |
|
BLAKE2b-256 | 3026e90638d9f79825249bfd0f7cf1ac71c98eb99bc705a734c55c74e8cbeb6d |