Haystack custom components for your favourite dataframe library.
Project description
Dataframes Haystack
📃 Description
dataframes-haystack
is an extension for Haystack 2 that enables integration with dataframe libraries.
The library offers custom Converters components that convert data stored in dataframes into Haystack Document
objects.
The dataframe libraries currently supported are:
🛠️ Installation
# for pandas (pandas is already included in `haystack-ai`)
pip install dataframes-haystack
# for polars
pip install "dataframes-haystack[polars]"
💻 Usage
[!TIP] See the Example Notebooks for complete examples.
Pandas
import pandas as pd
from dataframes_haystack.components.converters.pandas import PandasDataFrameConverter
df = pd.DataFrame({
"text": ["Hello world", "Hello everyone"],
"filename": ["doc1.txt", "doc2.txt"],
})
converter = PandasDataFrameConverter(content_column="text", meta_columns=["filename"])
documents = converter.run(df)
Result:
>>> documents
{'documents': [
Document(id=0, content: 'Hello world', meta: {'filename': 'doc1.txt'}),
Document(id=1, content: 'Hello everyone', meta: {'filename': 'doc2.txt'})
]}
Polars
import polars as pl
from dataframes_haystack.components.converters.polars import PolarsDataFrameConverter
df = pl.DataFrame({
"text": ["Hello world", "Hello everyone"],
"filename": ["doc1.txt", "doc2.txt"],
})
converter = PolarsDataFrameConverter(content_column="text", meta_columns=["filename"])
documents = converter.run(df)
Result:
>>> documents
{'documents': [
Document(id=0, content: 'Hello world', meta: {'filename': 'doc1.txt'}),
Document(id=1, content: 'Hello everyone', meta: {'filename': 'doc2.txt'})
]}
🤝 Contributing
Do you have an idea for a new feature? Did you find a bug that needs fixing?
Feel free to open an issue or submit a PR!
Setup development environment
Requirements: hatch
, pre-commit
- Clone the repository
- Run
hatch shell
to create and activate a virtual environment - Run
pre-commit install
to install the pre-commit hooks. This will force the linting and formatting checks.
Run tests
- Linting and formatting checks:
hatch run lint:fmt
- Unit tests:
hatch run test-cov-all
✍️ License
dataframes-haystack
is distributed under the terms of the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dataframes_haystack-0.0.1.tar.gz
(93.8 kB
view hashes)
Built Distribution
Close
Hashes for dataframes_haystack-0.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4741c3d6337a76429b4415a316cb6e7f0e61deb309cb902c2a020e7b3c62b2db |
|
MD5 | daaed7fdf702f3d6b443ffe7093ea62e |
|
BLAKE2b-256 | 85cd8da539f21de731e6b625d48825ee2022fb29cf6c4123618ec4268f224ef4 |
Close
Hashes for dataframes_haystack-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e74a0ea7d0db92e625f2ce35859d3acf9c0d061c3a4fc75e785e8d017599ee44 |
|
MD5 | 965b9bc2524f382c587a02ba2072cfa7 |
|
BLAKE2b-256 | 410b971178054da529af451fc960407d324fe913646fc4f325e51393be0dae95 |