Integrated CSV to RDF converter, using CSVW and nanopublications
Project description
CoW: Integrated CSV to RDF Converter
CoW (Csv on the Web) is an integrated CSV to RDF converter that uses the W3C standard CSVW for rich semantic table specificatons, and nanopublications as an output RDF model
What is CoW
CoW is a command-line utility to convert any CSV file into an RDF dataset. Its distinctive features are:
- Expressive CSVW-compatible schemas based on the Jinja template enginge
- Highly efficient implementation leveraging multithreaded and multicore architectures
- Available as a pythonic CLI tool, library, and web service
- Supports Python 3
Documentation and support
For user documentation see the basic introduction video https://t.co/SDWC3NhWZf and wiki. Technical details are provided below. If you encounter an issue then please report it. Also feel free to create pull requests!
Install (requires Python to be installed)
pip3
is the recommended method of installing COW in your system:
pip3 install cow-csvw
You can upgrade your currently installed version with:
pip3 install cow-csvw --upgrade
Possible issues:
- Permission issues. You can get around them by installing CoW in user space:
pip3 install cow-csvw --user
. Make sure your binary user directory (typically something like/Users/user/Library/Python/3.7/bin
in MacOS or/home/user/.local/bin
in Linux) is in your PATH (in MacOS:/etc/paths
. For Windows/MacOS we recommend to install Python via the official distribution page. You can also use virtualenv to avoid conflicts with your system libraries - Please report your unlisted issue
If you can't/don't want to deal with installing CoW, you can use the cattle web service version (deprecated).
Usage
CLI
The CLI (command line interface) is the recommended way of using CoW for most users. The straightforward CSV to RDF conversion is done in two steps. First:
cow_tool build myfile.csv
This will create a file named myfile.csv-metadata.json
(from now on: JSON schema file or just JSF). You don't need to worry about this file if you only want a syntactic conversion. Then:
cow_tool convert myfile.csv
Will output a myfile.csv.nq
RDF file (nquads by default; you can control the output RDF serialization with e.g. --format turtle
). That's it!
If you want to control the base URI namespace, URIs used in predicates, virtual columns, and the many other features of CoW, you'll need to edit the myfile.csv-metadata.json
JSF and/or use CoW arguments. Have a look at the CLI options below, the examples in the wiki, and the technical documentation.
Options
Check the --help
for a complete list of options:
usage: cow_tool [-h] [--dataset DATASET] [--delimiter DELIMITER]
[--quotechar QUOTECHAR] [--encoding ENCODING] [--processes PROCESSES]
[--chunksize CHUNKSIZE] [--base BASE]
[--format [{xml,n3,turtle,nt,pretty-xml,trix,trig,nquads}]]
[--gzip] [--version]
{convert,build} file [file ...]
Not nearly CSVW compliant schema builder and RDF converter
positional arguments:
{convert,build} Use the schema of the `file` specified to convert it
to RDF, or build a schema from scratch.
file Path(s) of the file(s) that should be used for
building or converting. Must be a CSV file.
optional arguments:
-h, --help show this help message and exit
--dataset DATASET A short name (slug) for the name of the dataset (will
use input file name if not specified)
--delimiter DELIMITER
The delimiter used in the CSV file(s)
--quotechar QUOTECHAR
The character used as quotation character in the CSV
file(s)
--encoding ENCODING The character encoding used in the CSV file(s)
--processes PROCESSES
The number of processes the converter should use
--chunksize CHUNKSIZE
The number of rows processed at each time
--base BASE The base for URIs generated with the schema (only
relevant when `build`ing a schema)
--gzip Compress the output file using gzip
--format [{xml,n3,turtle,nt,pretty-xml,trix,trig,nquads}], -f [{xml,n3,turtle,nt,pretty-xml,trix,trig,nquads}]
RDF serialization format
--version show program's version number and exit
Web service
There is web service and interface running CoW, called cattle. Two public instances are running at:
- http://cattle.datalegend.net/ - runs CoW in Python3
- http://legacy.cattle.datalegend.net/ - runs CoW in Python2 for legacy reasons
Beware of the web service limitations:
- There's a limit to the size of the CSVs you can upload
- It's a public instance, so your conversion could take longer
- Cattle is no longer being maintained and these public instances will eventually be taken offline
Library
Once installed, CoW can be used as a library as follows:
from cow_csvw.csvw_tool import COW
import os
COW(mode='build', files=[os.path.join(path, filename)], dataset='My dataset', delimiter=';', quotechar='\"')
COW(mode='convert', files=[os.path.join(path, filename)], dataset='My dataset', delimiter=';', quotechar='\"', processes=4, chunksize=100, base='http://example.org/my-dataset', format='turtle', gzipped=False)
Technical documentation
Technical documentation for CoW are maintained in this GitHub repository (under ), and published through Read the Docs at http://csvw-converter.readthedocs.io/en/latest/.
To build the documentation from source, change into the docs
directory, and run make html
. This should produce an HTML version of the documentation in the _build/html
directory.
Examples
The wiki provides more hands-on examples of transposing CSVs into Linked Data
License
MIT License (see license.txt)
Acknowledgements
Authors: Albert Meroño-Peñuela, Roderick van der Weerdt, Rinke Hoekstra, Kathrin Dentler, Auke Rijpma, Richard Zijdeman, Melvin Roest, Xander Wilcke
Copyright: Vrije Universiteit Amsterdam, Utrecht University, International Institute of Social History
CoW is developed and maintained by the CLARIAH project and funded by NWO.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file cow_csvw-1.21.tar.gz
.
File metadata
- Download URL: cow_csvw-1.21.tar.gz
- Upload date:
- Size: 20.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.8.3 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3db050be226aabe7a234b7d070152bf384184c8db4631184361d74bcfba0bb48 |
|
MD5 | 611568a8fbceb4a2f9f54ce50147415d |
|
BLAKE2b-256 | 8dc496a4c09c6fef23cf46ea16f38ce46ef72831685140950207e4006e43828b |