Skip to main content

Load any file into a pandas DataFrame, with a minimum of configuration, and a focus on bioinformatics

Project description

dataframer

PyPI version

Tries to load any file into a pandas DataFrame, with a minimum of configuration, and a focus on bioinformatics

Examples

Typically, you’ll read a file from disk (open('my-file.txt', 'rb')), but a byte stream is simpler here.

>>> from io import BytesIO
>>> from dataframer import dataframer
>>> from pandas import set_option

>>> set_option('display.max_columns', None)

>>> bytes = b'a,b,c,z\n1,2,3,foo\n4,5,6,bar'
>>> stream = BytesIO(bytes)

Default behavior is to strip non-numeric values after the first column.

>>> df_info = dataframer.parse(stream)
>>> df_info.data_frame
   b  c
a      
1  2  3
4  5  6
>>> df_info.label_map is None
True

Alternatively, they can be preserved in place...

>>> df_info = dataframer.parse(stream, keep_strings=True)
>>> df_info.data_frame
   b  c    z
a           
1  2  3  foo
4  5  6  bar
>>> df_info.label_map is None
True

... or they can be used to compose more meaningful row labels.

>>> df_info = dataframer.parse(stream, relabel=True)
>>> df_info.data_frame
   b  c
a      
1  2  3
4  5  6
>>> df_info.label_map
{1: 'foo / 1', 4: 'bar / 4'}

Alternatively, the first column can also be treated as data.

>>> df_info = dataframer.parse(stream, col_zero_index=False)
>>> df_info.data_frame
   a  b  c
0  1  2  3
1  4  5  6
>>> df_info.label_map is None
True

If you don't need the whole file, but instead only want the first row for column information:

>>> df_info = dataframer.parse(stream, first_row_only=True)
>>> df_info.data_frame
   b  c
a      
1  2  3
>>> df_info.label_map is None
True

Single column lists are given an implicit header:

>>> bytes = b'banana\napple\npear'
>>> stream = BytesIO(bytes)
>>> df_info = dataframer.parse(stream)
>>> df_info.data_frame
     item
0  banana
1   apple
2    pear

Release process

In your branch update VERSION.txt, using semantic versioning: When the PR is merged, the successful Travis build will push a new version to pypi.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataframer-0.0.3.tar.gz (4.8 kB view details)

Uploaded Source

Built Distribution

dataframer-0.0.3-py2.py3-none-any.whl (4.9 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file dataframer-0.0.3.tar.gz.

File metadata

  • Download URL: dataframer-0.0.3.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.6.3

File hashes

Hashes for dataframer-0.0.3.tar.gz
Algorithm Hash digest
SHA256 1883b531db9ef68a0b202b8ede0c0ea4d3ecd927718e4f4eaf7a18320c8cb669
MD5 2d0ee5bb488e4d32900c5e0573d6d154
BLAKE2b-256 8c859d801ec21ebd4d9bd4717d506821ce869a7dc13288a34d2cc367c7d14e18

See more details on using hashes here.

File details

Details for the file dataframer-0.0.3-py2.py3-none-any.whl.

File metadata

  • Download URL: dataframer-0.0.3-py2.py3-none-any.whl
  • Upload date:
  • Size: 4.9 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.6.3

File hashes

Hashes for dataframer-0.0.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 8b5efcf33d6c22db5a769c1b62fc203e7d3c44c4642870d289e7b3cb9dd4d88d
MD5 d3dc75ae1fcdd936bb0cf49c6297805a
BLAKE2b-256 7e4622a27c0b8e440bceff183a3d77c7e35d777c8e2c9099b9e9134b17cdd3c5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page