Skip to main content

A collection of scripts and utilities to support the stream-processing of MediaWiki data.

Project description

A set of utilities for stream-processing MediaWiki data.

Usage

mwstream (-h | --help)

mwstream <utility> [-h|--help]

Data processing utilities

diffs2persistence

Generates token persistence statistics using revision JSON blobs with diff information.

dump2json

Converts an XML dump to a stream of revision JSON blobs

json2diffs

Computes and adds a “diff” field to a stream of revision JSON blobs

persistence2stats

Aggregates a token persistence statistics to revision statistics

wikihadoop2json

Converts a Wikihadoop-processed stream of XML pages to JSON blobs

General utilities

json2tsv

Converts a stream of JSON blobs to tab-separated values based a set of fieldnames.

normalize

Normalizes old versions of RevisionDocument json schemas to correspond to the most recent schema version.

validate

Validates JSON against a provided schema.

Installation

pip install mwstreaming

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

mwstreaming-0.2.4.zip (17.0 kB view details)

Uploaded Source

mwstreaming-0.2.4.tar.gz (9.8 kB view details)

Uploaded Source

File details

Details for the file mwstreaming-0.2.4.zip.

File metadata

  • Download URL: mwstreaming-0.2.4.zip
  • Upload date:
  • Size: 17.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for mwstreaming-0.2.4.zip
Algorithm Hash digest
SHA256 d6a13436be705690e961c7e1393eb474f71eee69b232fcdbf555ae1ba1ec10d6
MD5 57aa909bdbc349760cf3ca48175c2252
BLAKE2b-256 97372536cd32b6526bce098a8da3faaf168e0f9aca5758eb002061c61ad8a07b

See more details on using hashes here.

File details

Details for the file mwstreaming-0.2.4.tar.gz.

File metadata

  • Download URL: mwstreaming-0.2.4.tar.gz
  • Upload date:
  • Size: 9.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for mwstreaming-0.2.4.tar.gz
Algorithm Hash digest
SHA256 3b9bc03b1519d1b6fcf8276d076076182dece40768ea7b36b95f618d4734b2f8
MD5 2da828910a411bc8a64dd02b7866bbd2
BLAKE2b-256 6ce72e680be027d94c8bc0ce5913acec377ddd212de043ecff79e5868a5b2cd2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page