Skip to main content

No project description provided

Project description

Dataframes Haystack

PyPI - Version PyPI - Python Version pre-commit.ci status


Table of Contents

Description

dataframes-haystack is a Python library that allows various dataframe libraries to integrate with Haystack 2.x.

The library offers custom Converters components that convert data in dataframes into Haystack Documents.

The dataframe libraries currently supported are:

Installation

# for pandas (pandas is already included in `haystack-ai`)
pip install dataframes-haystack

# for polars
pip install "dataframes-haystack[polars]"

Usage

Pandas

import pandas as pd

from dataframes_haystack.components.converters.pandas import PandasDataFrameConverter

df = pd.DataFrame({
    "text": ["Hello world", "Hello everyone"],
    "filename": ["doc1.txt", "doc2.txt"],
})

converter = PandasDataFrameConverter(content_column="text", meta_columns=["filename"])
documents = converter.run(df)

Result:

>>> documents
{'documents': [Document(id=2eaefcdeb8d31f9f3d543c614233476ff70c0ed5aae609667172786d09588223, content: 'Hello world', meta: {'filename': 'doc1.txt'}), Document(id=bdc99cbfe819356159950dbaffa0521b47ec3ac2ff040604c93fe7798cc71efc, content: 'Hello everyone', meta: {'filename': 'doc2.txt'})]}

Polars

import polars as pl

from dataframes_haystack.components.converters.polars import PolarsDataFrameConverter

df = pl.DataFrame({
    "text": ["Hello world", "Hello everyone"],
    "filename": ["doc1.txt", "doc2.txt"],
})

converter = PolarsDataFrameConverter(content_column="text", meta_columns=["filename"])
documents = converter.run(df)

Result:

>>> documents
{'documents': [Document(id=2eaefcdeb8d31f9f3d543c614233476ff70c0ed5aae609667172786d09588223, content: 'Hello world', meta: {'filename': 'doc1.txt'}), Document(id=bdc99cbfe819356159950dbaffa0521b47ec3ac2ff040604c93fe7798cc71efc, content: 'Hello everyone', meta: {'filename': 'doc2.txt'})]}

Contributing

Do you have an idea for a new feature? Did you find a bug that needs fixing?

Feel free to open an issue or submit a PR!

Setup development environment

Requirements: hatch, pre-commit

  1. Clone the repository
  2. Run hatch shell to create and activate a virtual environment
  3. Run pre-commit install to install the pre-commit hooks. This will force the linting and formatting checks.

Run tests

  • Linting and formatting checks: hatch run lint:fmt
  • Unit tests: hatch run test-cov-all

License

dataframes-haystack is distributed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataframes_haystack-0.0.1a0.tar.gz (8.6 kB view details)

Uploaded Source

Built Distribution

dataframes_haystack-0.0.1a0-py3-none-any.whl (7.2 kB view details)

Uploaded Python 3

File details

Details for the file dataframes_haystack-0.0.1a0.tar.gz.

File metadata

File hashes

Hashes for dataframes_haystack-0.0.1a0.tar.gz
Algorithm Hash digest
SHA256 39f90c61db827464e30df57449d5c835a162473640b2d4766ee84d56382ebc4c
MD5 82784eaba7330038a7d61a1a4d789dc8
BLAKE2b-256 149fb89fbdb54c2e1efd8a6932911e92683ebcc5a8a68a80a44ca95e7939a96a

See more details on using hashes here.

File details

Details for the file dataframes_haystack-0.0.1a0-py3-none-any.whl.

File metadata

File hashes

Hashes for dataframes_haystack-0.0.1a0-py3-none-any.whl
Algorithm Hash digest
SHA256 bc6add4b8531a75f16556747d74b041f779995086d6330d45fc38ba2789858af
MD5 bd4dc187195ee987938cd70198198bec
BLAKE2b-256 c8db791947a0f1e71377bbda1d594df2f4b9fb9519765d462ee75020d3752314

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page