Skip to main content

No project description provided

Project description

Dataframes Haystack

PyPI - Version PyPI - Python Version pre-commit.ci status


Table of Contents

Description

dataframes-haystack is a Python library that allows various dataframe libraries to integrate with Haystack 2.x.

The library offers custom Converters components that convert data in dataframes into Haystack Documents.

The dataframe libraries currently supported are:

Installation

# for pandas (pandas is already included in `haystack-ai`)
pip install dataframes-haystack

# for polars
pip install "dataframes-haystack[polars]"

Usage

Pandas

import pandas as pd

from dataframes_haystack.components.converters.pandas import PandasDataFrameConverter

df = pd.DataFrame({
    "text": ["Hello world", "Hello everyone"],
    "filename": ["doc1.txt", "doc2.txt"],
})

converter = PandasDataFrameConverter(content_column="text", meta_columns=["filename"])
documents = converter.run(df)

Result:

>>> documents
{'documents': [Document(id=2eaefcdeb8d31f9f3d543c614233476ff70c0ed5aae609667172786d09588223, content: 'Hello world', meta: {'filename': 'doc1.txt'}), Document(id=bdc99cbfe819356159950dbaffa0521b47ec3ac2ff040604c93fe7798cc71efc, content: 'Hello everyone', meta: {'filename': 'doc2.txt'})]}

Polars

import polars as pl

from dataframes_haystack.components.converters.polars import PolarsDataFrameConverter

df = pl.DataFrame({
    "text": ["Hello world", "Hello everyone"],
    "filename": ["doc1.txt", "doc2.txt"],
})

converter = PolarsDataFrameConverter(content_column="text", meta_columns=["filename"])
documents = converter.run(df)

Result:

>>> documents
{'documents': [Document(id=2eaefcdeb8d31f9f3d543c614233476ff70c0ed5aae609667172786d09588223, content: 'Hello world', meta: {'filename': 'doc1.txt'}), Document(id=bdc99cbfe819356159950dbaffa0521b47ec3ac2ff040604c93fe7798cc71efc, content: 'Hello everyone', meta: {'filename': 'doc2.txt'})]}

Contributing

Do you have an idea for a new feature? Did you find a bug that needs fixing?

Feel free to open an issue or submit a PR!

Setup development environment

Requirements: hatch, pre-commit

  1. Clone the repository
  2. Run hatch shell to create and activate a virtual environment
  3. Run pre-commit install to install the pre-commit hooks. This will force the linting and formatting checks.

Run tests

  • Linting and formatting checks: hatch run lint:fmt
  • Unit tests: hatch run test-cov-all

License

dataframes-haystack is distributed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataframes_haystack-0.0.1a1.tar.gz (9.2 kB view details)

Uploaded Source

Built Distribution

dataframes_haystack-0.0.1a1-py3-none-any.whl (7.2 kB view details)

Uploaded Python 3

File details

Details for the file dataframes_haystack-0.0.1a1.tar.gz.

File metadata

  • Download URL: dataframes_haystack-0.0.1a1.tar.gz
  • Upload date:
  • Size: 9.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for dataframes_haystack-0.0.1a1.tar.gz
Algorithm Hash digest
SHA256 32145fb4940e46b1d2059a5a8039967cc2b9f6621bae0f7d85596bd8720ef360
MD5 c4e15d76e7fdac480369b631cc1200aa
BLAKE2b-256 1def436c88e7ce77332db1bb372a3010bd154e8dee64e2441ac18eeacd3d16c5

See more details on using hashes here.

File details

Details for the file dataframes_haystack-0.0.1a1-py3-none-any.whl.

File metadata

File hashes

Hashes for dataframes_haystack-0.0.1a1-py3-none-any.whl
Algorithm Hash digest
SHA256 3532366b10771ebeab4f3db8863008f100a37d870326cea1422e7a433f43479d
MD5 4e44792687fc62651856c4321acc821e
BLAKE2b-256 f261274397117bef7582e689e8b55a39aab9f380f989142b2a7122472aa9a7a2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page