Collect metadata for Internet Archive collections

Project description

iacoll

iacoll will collect all the item metadata for an Internet Archive collection and store it in a LevelDB database. The database is a key/value store where the key is the unique Internet Archive item identifier, and the value is the JSON for the item metadata.

For example you can download the metadata for items in the University of Maryland's collection:

% iacoll university_maryland_cp

By default iacoll will create the LevelDB database in a directory named with the item identifier. If you would like to control this you can explicitly pass it:

% iacoll university_maryland_cp --db /path/to/my/leveldb/database

When you run iacoll repeatedly it will look at the database and only fetch newer records. If an update ever fails you may want to force a full scan:

% iacoll university_maryland_cp --fullscan

If you would like to dump the metadata as line oriented JSON you can use --dump:

% iacoll university_maryland_cp --dump > university_maryland_cp.jsonl

Install

To install iacoll you'll first need to install Python and then:

pip install iacoll

Project details

Release history Release notifications | RSS feed

This version

0.0.3

Jan 2, 2019

0.0.2

Jan 1, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iacoll-0.0.3.tar.gz (3.0 kB view details)

Uploaded Jan 2, 2019 Source

File details

Details for the file iacoll-0.0.3.tar.gz.

File metadata

Download URL: iacoll-0.0.3.tar.gz
Upload date: Jan 2, 2019
Size: 3.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.6

File hashes

Hashes for iacoll-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`498d1c0835004b964ad810c5b4ddf20ad12e13f9277edaf62bbd08cc3efc0a6c`
MD5	`18e38e89ab15eb02c8d4a0c7d965ef83`
BLAKE2b-256	`163027cbad2d8e338bf9930fd8cce0783ac263b73c7ded2ae820af4b86af1820`