Specialized & performant CSV readers, writers and enrichers for python.
Project description
Casanova
If you often find yourself reading CSV files using python, you will quickly notice that, while being more comfortable, csv.DictReader
remains way slower than csv.reader
:
# To read a 1.5G CSV file:
csv.reader: 24s
csv.DictReader: 84s
Casanova is therefore an attempt to stick to csv.reader
performance while still keeping a comfortable interface, still able to consider headers etc.
Casanova is thus a good fit for you if you need to:
- Stream large CSV files without running out of memory
- Enrich the same CSV files by outputing a similar file, all while adding, filtering and editing cells.
- Have the possibility to resume said enrichment if your process exited
- Do so in a threadsafe fashion, and be able to resume even if your output does not have the same order as the input
Installation
You can install casanova
with pip with the following command:
pip install casanova
Usage
reader
import casanova
with open('./file.csv') as f:
# Interested in a single column?
for url in casanova.reader(f, column='url'):
print(url)
# No headers?
for url in casanova.reader(f, column=0, no_headers=True):
print(url)
# Interested in several columns
for title, url in casanova.reader(f, columns=('title', 'url')):
print(title, url)
# Working on records
for record in casanova.reader(f, columns=('title', 'url')):
# record is a namedtuple based on your columns
print(record[0], record.url)
# Records slow you down? Need to go faster?
# You can iterate directly on rows and use the reader's recorded positions
reader = casanova.reader(f, columns=('title', 'url'))
url_pos = reader.pos.url
for row in reader.rows():
print(row[url_pos])
Arguments
- file file: file object to read.
- column ?str|int: name or index of target column.
- colums ?iterable<str|int>: iterable of name or index of target columns.
Attributes
- pos int|namedtuple: index of target column or named tuple of target columns.
Methods
iter
Lets you iterate on a single value or on a namedtuple record.
rows
Lets you iterate over the original csv.reader
.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file casanova-0.1.0.tar.gz
.
File metadata
- Download URL: casanova-0.1.0.tar.gz
- Upload date:
- Size: 3.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 95633fe6b458e1609569f1d4e83435924a922ad52cbaaabcd6e56648a3137058 |
|
MD5 | 977c865be6e3e6651bd7d90b525b362c |
|
BLAKE2b-256 | 08de85c035c08f0553d44b5d58384865d9ac751692ba34923070da431e9591e7 |
File details
Details for the file casanova-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: casanova-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 19c1d2378eca2c03310d21823e9d5cb7bd75a43e4af328b0f20ab86715a37e50 |
|
MD5 | c74c73d9ef79a4649caec0d48b913681 |
|
BLAKE2b-256 | 246e6c3c693a6a1a824168f183e99ededd714f1db03a1be00034138aec4c37b2 |