Skip to main content

Python library to work with ARC and WARC files

Project description

build status

WARC (Web ARChive) is a file format for storing web crawls.

http://bibnum.bnf.fr/WARC/

This warc library makes it very easy to work with WARC files.:

import warc
f = warc.open("test.warc")
for record in f:
    print record['WARC-Target-URI'], record['Content-Length']

Documentation

The documentation of the warc library is available at http://warc.readthedocs.org/.

License

This software is licensed under GPL v2. See LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

warc-0.2.1.tar.gz (18.4 kB view details)

Uploaded Source

File details

Details for the file warc-0.2.1.tar.gz.

File metadata

  • Download URL: warc-0.2.1.tar.gz
  • Upload date:
  • Size: 18.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for warc-0.2.1.tar.gz
Algorithm Hash digest
SHA256 65ec3336287ae7a17c969736935ba188678df10f2ec813d8e3474cc51bb71d39
MD5 3235a8b68e28c77d45227b2850654776
BLAKE2b-256 9ab430d87239ec30cd0c504bd7dec9cd22b51ef0cbb00d6fbbc138b1ddcfc108

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page