Skip to main content

Streaming operations with pandas.

Project description

https://github.com/sdpython/pandas_streaming/blob/master/_doc/sphinxdoc/source/phdoc_static/project_ico.png?raw=true

pandas_streaming: streaming API over pandas

Build status Build Status Windows https://circleci.com/gh/sdpython/pandas_streaming/tree/master.svg?style=svg https://dev.azure.com/xavierdupre3/pandas_streaming/_apis/build/status/sdpython.pandas_streaming https://badge.fury.io/py/pandas_streaming.svg MIT License https://codecov.io/github/sdpython/pandas_streaming/coverage.svg?branch=master GitHub Issues Notebook Coverage https://api.codacy.com/project/badge/Grade/f53b7f4d6a0447aa9ce0c4ad5df659ef Downloads Forks Stars size

pandas_streaming aims at processing big files with pandas, too big to hold in memory, too small to be parallelized with a significant gain. The module replicates a subset of pandas API and implements other functionalities for machine learning.

from pandas_streaming.df import StreamingDataFrame
sdf = StreamingDataFrame.read_csv("filename", sep="\t", encoding="utf-8")

for df in sdf:
    # process this chunk of data
    # df is a dataframe
    print(df)

The module can also stream an existing dataframe.

import pandas
df = pandas.DataFrame([dict(cf=0, cint=0, cstr="0"),
                       dict(cf=1, cint=1, cstr="1"),
                       dict(cf=3, cint=3, cstr="3")])

from pandas_streaming.df import StreamingDataFrame
sdf = StreamingDataFrame.read_df(df)

for df in sdf:
    # process this chunk of data
    # df is a dataframe
    print(df)

It contains other helpers to split datasets into train and test with some weird constraints.

Links:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas_streaming-0.3.218.tar.gz (33.3 kB view details)

Uploaded Source

Built Distribution

pandas_streaming-0.3.218-py3-none-any.whl (35.6 kB view details)

Uploaded Python 3

File details

Details for the file pandas_streaming-0.3.218.tar.gz.

File metadata

  • Download URL: pandas_streaming-0.3.218.tar.gz
  • Upload date:
  • Size: 33.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.9.5

File hashes

Hashes for pandas_streaming-0.3.218.tar.gz
Algorithm Hash digest
SHA256 879768b3309088864f1de6970ae0ac5a7faab837a426828643b9d23b05f8e08a
MD5 1b31c78e5bcc771b0a33107075f6eea8
BLAKE2b-256 9b7a7e703ecba8b87a831e925ae841f2aa3b99747b8d674fea592074e60ebf79

See more details on using hashes here.

File details

Details for the file pandas_streaming-0.3.218-py3-none-any.whl.

File metadata

  • Download URL: pandas_streaming-0.3.218-py3-none-any.whl
  • Upload date:
  • Size: 35.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.9.5

File hashes

Hashes for pandas_streaming-0.3.218-py3-none-any.whl
Algorithm Hash digest
SHA256 a4e7a8d923598f7d3fe5e999c8f543c061c4e77a5eb48b9c66b2a169c03ad19b
MD5 0e9e5e5fe2f947beab9d23d405a837b2
BLAKE2b-256 a9d7b58ebf4810b4324cea00b72a423a65bf6c2bf083b1ecd7ad3fc06d174373

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page