Skip to main content

Specialized & performant CSV readers, writers and enrichers for python.

Project description

Build Status

Casanova

If you often find yourself reading CSV files using python, you will quickly notice that, while being more comfortable, csv.DictReader remains way slower than csv.reader:

# To read a 1.5G CSV file:
csv.reader: 24s
csv.DictReader: 84s

Casanova is therefore an attempt to stick to csv.reader performance while still keeping a comfortable interface, still able to consider headers etc.

Casanova is thus a good fit for you if you need to:

  • Stream large CSV files without running out of memory
  • Enrich the same CSV files by outputing a similar file, all while adding, filtering and editing cells.
  • Have the possibility to resume said enrichment if your process exited
  • Do so in a threadsafe fashion, and be able to resume even if your output does not have the same order as the input

Installation

You can install casanova with pip with the following command:

pip install casanova

Usage

reader

import casanova

with open('./file.csv') as f:

  # Interested in a single column?
  for url in casanova.reader(f, column='url'):
    print(url)

  # No headers?
  for url in casanova.reader(f, column=0, no_headers=True):
    print(url)

  # Interested in several columns
  for title, url in casanova.reader(f, columns=('title', 'url')):
    print(title, url)

  # Working on records
  for record in casanova.reader(f, columns=('title', 'url')):
    # record is a namedtuple based on your columns
    print(record[0], record.url)

  # Records slow you down? Need to go faster?
  # You can iterate directly on rows and use the reader's recorded positions
  reader = casanova.reader(f, columns=('title', 'url'))
  url_pos = reader.pos.url

  for row in reader.rows():
    print(row[url_pos])

Arguments

  • file file: file object to read.
  • column ?str|int: name or index of target column.
  • colums ?iterable<str|int>: iterable of name or index of target columns.

Attributes

  • pos int|namedtuple: index of target column or named tuple of target columns.

Methods

iter

Lets you iterate on a single value or on a namedtuple record.

rows

Lets you iterate over the original csv.reader.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

casanova-0.1.0.tar.gz (3.2 kB view details)

Uploaded Source

Built Distribution

casanova-0.1.0-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file casanova-0.1.0.tar.gz.

File metadata

  • Download URL: casanova-0.1.0.tar.gz
  • Upload date:
  • Size: 3.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.5

File hashes

Hashes for casanova-0.1.0.tar.gz
Algorithm Hash digest
SHA256 95633fe6b458e1609569f1d4e83435924a922ad52cbaaabcd6e56648a3137058
MD5 977c865be6e3e6651bd7d90b525b362c
BLAKE2b-256 08de85c035c08f0553d44b5d58384865d9ac751692ba34923070da431e9591e7

See more details on using hashes here.

File details

Details for the file casanova-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: casanova-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 4.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.5

File hashes

Hashes for casanova-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 19c1d2378eca2c03310d21823e9d5cb7bd75a43e4af328b0f20ab86715a37e50
MD5 c74c73d9ef79a4649caec0d48b913681
BLAKE2b-256 246e6c3c693a6a1a824168f183e99ededd714f1db03a1be00034138aec4c37b2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page