Skip to main content

Prepping tables for machine learning

Project description

skrub logo

py_ver pypi_var pypi_dl codecov circleci black

skrub (formerly dirty_cat) is a Python library that facilitates prepping your tables for machine learning.

If you like the package, spread the word and ⭐ this repository! You can also join the discord server.

Website: https://skrub-data.org/

What can skrub do?

The goal of skrub is to bridge the gap between tabular data sources and machine-learning models.

skrub provides high-level tools for joining dataframes (Joiner, AggJoiner, …), encoding columns (MinHashEncoder, ToCategorical, …), building a pipeline (TableVectorizer, tabular_learner, …), and more.

>>> from skrub.datasets import fetch_employee_salaries
>>> dataset = fetch_employee_salaries()
>>> df = dataset.X
>>> y = dataset.y
>>> df.iloc[0]
gender                                                                     F
department                                                               POL
department_name                                         Department of Police
division                   MSB Information Mgmt and Tech Division Records...
assignment_category                                         Fulltime-Regular
employee_position_title                          Office Services Coordinator
date_first_hired                                                  09/22/1986
year_first_hired                                                        1986
>>> from sklearn.model_selection import cross_val_score
>>> from skrub import tabular_learner
>>> cross_val_score(tabular_learner('regressor'), df, y)
array([0.89370447, 0.89279068, 0.92282557, 0.92319094, 0.92162666])

See our examples.

Installation

skrub can easily be installed via pip or conda. For more installation information, see the installation instructions.

Contributing

The best way to support the development of skrub is to spread the word!

Also, if you already are a skrub user, we would love to hear about your use cases and challenges in the Discussions section.

To report a bug or suggest enhancements, please open an issue and/or submit a pull request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skrub-0.3.1.tar.gz (6.3 MB view details)

Uploaded Source

Built Distribution

skrub-0.3.1-py3-none-any.whl (304.2 kB view details)

Uploaded Python 3

File details

Details for the file skrub-0.3.1.tar.gz.

File metadata

  • Download URL: skrub-0.3.1.tar.gz
  • Upload date:
  • Size: 6.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.8

File hashes

Hashes for skrub-0.3.1.tar.gz
Algorithm Hash digest
SHA256 b745cca583732f23c9d410e2ca220f4f3bddb71e6549925ab89aa6ee9d3d55a5
MD5 b2050a91106383605640b763c1fd5cdb
BLAKE2b-256 0efed9d6be2e27e939ed8b6f68f846b2da438653af74b232039ef3cf9d1291b8

See more details on using hashes here.

Provenance

File details

Details for the file skrub-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: skrub-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 304.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.8

File hashes

Hashes for skrub-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0495ced71f569894b6fbf5239bfae5a4bc839743c36eeeb96d45b370d2bdb4f6
MD5 e59cc3d1a10e3c9257874dd4de8e6548
BLAKE2b-256 f6da97bfd38b20cfc72ad2cf8e85681d5207b41cec3d6504e4d0f2cfe5b33612

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page