Skip to main content

Haystack custom components for your favourite dataframe library.

Project description

Dataframes Haystack

PyPI - Version PyPI - Python Version PyPI - License

Code style: black Ruff pre-commit.ci status


📃 Description

dataframes-haystack is an extension for Haystack 2 that enables integration with dataframe libraries.

The library offers custom Converters components that convert data stored in dataframes into Haystack Document objects.

The dataframe libraries currently supported are:

🛠️ Installation

# for pandas (pandas is already included in `haystack-ai`)
pip install dataframes-haystack

# for polars
pip install "dataframes-haystack[polars]"

💻 Usage

[!TIP] See the Example Notebooks for complete examples.

Pandas

import pandas as pd

from dataframes_haystack.components.converters.pandas import PandasDataFrameConverter

df = pd.DataFrame({
    "text": ["Hello world", "Hello everyone"],
    "filename": ["doc1.txt", "doc2.txt"],
})

converter = PandasDataFrameConverter(content_column="text", meta_columns=["filename"])
documents = converter.run(df)

Result:

>>> documents
{'documents': [
    Document(id=0, content: 'Hello world', meta: {'filename': 'doc1.txt'}),
    Document(id=1, content: 'Hello everyone', meta: {'filename': 'doc2.txt'})
]}

Polars

import polars as pl

from dataframes_haystack.components.converters.polars import PolarsDataFrameConverter

df = pl.DataFrame({
    "text": ["Hello world", "Hello everyone"],
    "filename": ["doc1.txt", "doc2.txt"],
})

converter = PolarsDataFrameConverter(content_column="text", meta_columns=["filename"])
documents = converter.run(df)

Result:

>>> documents
{'documents': [
    Document(id=0, content: 'Hello world', meta: {'filename': 'doc1.txt'}),
    Document(id=1, content: 'Hello everyone', meta: {'filename': 'doc2.txt'})
]}

🤝 Contributing

Do you have an idea for a new feature? Did you find a bug that needs fixing?

Feel free to open an issue or submit a PR!

Setup development environment

Requirements: hatch, pre-commit

  1. Clone the repository
  2. Run hatch shell to create and activate a virtual environment
  3. Run pre-commit install to install the pre-commit hooks. This will force the linting and formatting checks.

Run tests

  • Linting and formatting checks: hatch run lint:fmt
  • Unit tests: hatch run test-cov-all

✍️ License

dataframes-haystack is distributed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataframes_haystack-0.0.1.tar.gz (93.8 kB view details)

Uploaded Source

Built Distribution

dataframes_haystack-0.0.1-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file dataframes_haystack-0.0.1.tar.gz.

File metadata

  • Download URL: dataframes_haystack-0.0.1.tar.gz
  • Upload date:
  • Size: 93.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for dataframes_haystack-0.0.1.tar.gz
Algorithm Hash digest
SHA256 4741c3d6337a76429b4415a316cb6e7f0e61deb309cb902c2a020e7b3c62b2db
MD5 daaed7fdf702f3d6b443ffe7093ea62e
BLAKE2b-256 85cd8da539f21de731e6b625d48825ee2022fb29cf6c4123618ec4268f224ef4

See more details on using hashes here.

File details

Details for the file dataframes_haystack-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for dataframes_haystack-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e74a0ea7d0db92e625f2ce35859d3acf9c0d061c3a4fc75e785e8d017599ee44
MD5 965b9bc2524f382c587a02ba2072cfa7
BLAKE2b-256 410b971178054da529af451fc960407d324fe913646fc4f325e51393be0dae95

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page