Skip to main content

Haystack custom components for your favourite dataframe library.

Project description

Dataframes Haystack

PyPI - Version PyPI - Python Version PyPI - License

Code style: black Ruff

GH Actions Tests pre-commit.ci status


📃 Description

dataframes-haystack is an extension for Haystack 2 that enables integration with dataframe libraries.

The dataframe libraries currently supported are:

The library offers various custom Converters components to transform dataframes into Haystack Document objects:

  • FileToPandasDataFrame and FileToPolarsDataFrame read files and convert them into dataframes.
  • PandasDataFrameConverter or PolarsDataFrameConverter convert data stored in dataframes into Haystack Documentobjects.

🛠️ Installation

# for pandas (pandas is already included in `haystack-ai`)
pip install dataframes-haystack

# for polars
pip install "dataframes-haystack[polars]"

💻 Usage

[!TIP] See the Example Notebooks for complete examples.

Pandas

FileToPandasDataFrame

from dataframes_haystack.components.converters.pandas import FileToPandasDataFrame

converter = FileToPandasDataFrame(file_format="csv")

output_dataframe = converter.run(
    file_paths=["data/doc1.csv", "data/doc2.csv"]
)

Result:

>>> output_dataframe
{'dataframe': <pandas.DataFrame>}

PandasDataFrameConverter

import pandas as pd

from dataframes_haystack.components.converters.pandas import PandasDataFrameConverter

df = pd.DataFrame({
    "text": ["Hello world", "Hello everyone"],
    "filename": ["doc1.txt", "doc2.txt"],
})

converter = PandasDataFrameConverter(content_column="text", meta_columns=["filename"])
documents = converter.run(df)

Result:

>>> documents
{'documents': [
    Document(id=0, content: 'Hello world', meta: {'filename': 'doc1.txt'}),
    Document(id=1, content: 'Hello everyone', meta: {'filename': 'doc2.txt'})
]}

Polars

FileToPolarsDataFrame

from dataframes_haystack.components.converters.polars import FileToPolarsDataFrame

converter = FileToPolarsDataFrame(file_format="csv")

output_dataframe = converter.run(
    file_paths=["data/doc1.csv", "data/doc2.csv"]
)

Result:

>>> output_dataframe
{'dataframe': <polars.DataFrame>}

PolarsDataFrameConverter

import polars as pl

from dataframes_haystack.components.converters.polars import PolarsDataFrameConverter

df = pl.DataFrame({
    "text": ["Hello world", "Hello everyone"],
    "filename": ["doc1.txt", "doc2.txt"],
})

converter = PolarsDataFrameConverter(content_column="text", meta_columns=["filename"])
documents = converter.run(df)

Result:

>>> documents
{'documents': [
    Document(id=0, content: 'Hello world', meta: {'filename': 'doc1.txt'}),
    Document(id=1, content: 'Hello everyone', meta: {'filename': 'doc2.txt'})
]}

🤝 Contributing

Do you have an idea for a new feature? Did you find a bug that needs fixing?

Feel free to open an issue or submit a PR!

Setup development environment

Requirements: hatch, pre-commit

  1. Clone the repository
  2. Run hatch shell to create and activate a virtual environment
  3. Run pre-commit install to install the pre-commit hooks. This will force the linting and formatting checks.

Run tests

  • Linting and formatting checks: hatch run lint:fmt
  • Unit tests: hatch run test-cov-all

✍️ License

dataframes-haystack is distributed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataframes_haystack-0.0.2.tar.gz (117.9 kB view details)

Uploaded Source

Built Distribution

dataframes_haystack-0.0.2-py3-none-any.whl (9.7 kB view details)

Uploaded Python 3

File details

Details for the file dataframes_haystack-0.0.2.tar.gz.

File metadata

  • Download URL: dataframes_haystack-0.0.2.tar.gz
  • Upload date:
  • Size: 117.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for dataframes_haystack-0.0.2.tar.gz
Algorithm Hash digest
SHA256 442a1ad00d3dafbddbd933d3bf72dbdabfa9249b62978592263169354a3ee844
MD5 826589aaedd0edd6ab97f4a446f1922a
BLAKE2b-256 2107688833c253328e9f6c5d131ff49af0f7c657d275db42d80174342ccecac3

See more details on using hashes here.

File details

Details for the file dataframes_haystack-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for dataframes_haystack-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 68b7f350909d29a50e6ea0e584face3feedd2079a2a60b6fab619929a893737c
MD5 d96365cafeb0fa528905a421423daf73
BLAKE2b-256 7104be9076ea94d9f9da74021c011eb9a904c48f354d07e0fefb3e0a95f916a7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page