Skip to main content

Convert Marc21 Classification records in MARC/XML to SKOS/RDF

Project description

Build status Test coverage Code health Latest version MIT license

Python script for converting MARC 21 Classification records (serialized as MARCXML) to SKOS concepts.

Developed to support the project “Felles terminologi for klassifikasjon med Dewey”, it has only been tested with Dewey Decimal Classification (DDC) records. Issues and suggestions for generalizations and improvements are welcome!

Installation

Releases can be installed from the command line with pip:

$ pip install --upgrade mc2skos             # with virtualenv or as root
$ pip install --upgrade --user mc2skos      # install to ~/.local
  • Works with both Python 2.x and 3.x. See Travis for details on tested Python versions.

  • If lxml fails to install on Windows, try the windows installer from from PyPI.

  • If lxml fails to install on Unix, install system packages python-dev and libxml2-dev

  • Make sure the Python scripts folder has been added to your PATH.

To directly use a version from source code repository:

$ git clone https://github.com/scriptotek/mc2skos.git
$ cd mc2skos
$ pip install -e .

Usage

mc2skos infile.xml outfile.ttl

Run mc2skos --help or mc2skos -h for options.

URIs

For records with 084 $a == "ddc", URIs are generated on the form http://dewey.info/{collection}/{object}/e{edition}/, where {collection} is “class”, “table” or “scheme”, and {edition} is taken from 084 $c (with language code stripped).

<http://dewey.info/class/6--982/e21/> a skos:Concept ;
    skos:inScheme <http://dewey.info/scheme/edition/e21/>,
        <http://dewey.info/table/6/e21/> ;
    skos:notation "T6--982" ;
    skos:prefLabel "Chibchan and Paezan languages"@en .

To override this, you can specify --uri to set a URI template for classes and table record, --scheme to set a URI to be used with skos:inScheme for all records, and --table_scheme to set a URI template to be used with skos:inScheme for table records. Note that if --uri is specified, but not --scheme, no skos:inScheme will be added. Same goes with --table_scheme.

Mapping schema

Only a small part of the MARC21 Classification data model is converted, and the conversion follows a rather pragmatic approach, exemplified by the mapping of the 7XX fields to skos:altLabel.

MARC21XML

RDF

153 $a, $c, $z Classification number

skos:notation

153 $j Caption

skos:prefLabel

153 $e, $f, $z Classification number hierarchy

skos:broader

253 Complex See Reference

skos:editorialNote

353 Complex See Also Reference

skos:editorialNote

680 Scope Note

skos:scopeNote

683 Application Instruction Note

skos:editorialNote

685 History Note

skos:historyNote

694 ??? Note

skos:editorialNote

700 Index Term-Personal Name

skos:altLabel

710 Index Term-Corporate Name

skos:altLabel

711 Index Term-Meeting Name

skos:altLabel

730 Index Term-Uniform Title

skos:altLabel

748 Index Term-Chronological

skos:altLabel

750 Index Term-Topical

skos:altLabel

751 Index Term-Geographic Name

skos:altLabel

753 Index Term-Uncontrolled

skos:altLabel

765 Synthesized Number Components

mads:componentList (see below)

Synthesized number components

Components of synthesized numbers explicitly described in 765 fields are expressed using the mads:componentList property, and to preserve the order of the components, we use RDF lists. Example:

@prefix mads: <http://www.loc.gov/mads/rdf/v1#> .

<http://dewey.info/class/001.30973/e23/> a skos:Concept ;
    mads:componentList (
        <http://dewey.info/class/001.3/e23/>
        <http://dewey.info/class/1--09/e23/>
        <http://dewey.info/class/2--73/e23/>
    ) ;
    skos:notation "001.30973" .

Retrieving list members in order is surprisingly hard with SPARQL. Retrieving ordered pairs is the best solution I’ve come up with so far:

PREFIX mads: <http://www.loc.gov/mads/rdf/v1#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT ?c1_notation ?c1_label ?c2_notation ?c2_label
WHERE { GRAPH <http://localhost/ddc23no> {

    <http://dewey.info/class/001.30973/e23/> mads:componentList ?l .
        ?l rdf:rest* ?sl .
        ?sl rdf:first ?e1 .
        ?sl rdf:rest ?sln .
        ?sln rdf:first ?e2 .

        ?e1 skos:notation ?c1_notation .
        ?e2 skos:notation ?c2_notation .

        OPTIONAL {
            ?e1 skos:prefLabel ?c1_label .
        }
        OPTIONAL {
            ?e2 skos:prefLabel ?c2_label .
        }
}}

c1_notation

c1_label

c2_notation

c2_label

“001.3”

“Humaniora”@nb

“T1–09”

“Historie, geografisk behandling, biografier”@nb

“T1–09”

“Historie, geografisk behandling, biografier”@nb

“T2–73”

“USA”@nb

Additional processing for data from WebDewey

The script is supposed to work with any MARC21 classification data, but also supports the non-standard ess codes supplied in WebDewey data to differentiate between different types of notes.

MARC21XML

RDF

680 having $9 ess=ndf Definition note

skos:definition

680 having $9 ess=nvn Variant name note

wd:variantName for each subfield $t

680 having $9 ess=nch Class here note

wd:classHere for each subfield $t

680 having $9 ess=nin Including note

wd:including for each subfield $t

680 having $9 ess=nph Former heading

wd:formerHeading for each subfield $t

685 having $9 ess=ndn Deprecation note

owl:deprecated true

694 having $9 ess=nml ???

SKOS.editorialNote

Notes that are currently not treated in any special way:

  • 253 having $9 ess=nsx Do-not-use.

  • 253 having $9 ess=nce Class-elsewhere

  • 253 having $9 ess=ncw Class-elsewhere-manual

  • 253 having $9 ess=nse See.

  • 253 having $9 ess=nsw See-manual.

  • 353 having $9 ess=nsa See-also

  • 683 having $9 ess=nbu Preference note

  • 683 having $9 ess=nop Options note

  • 683 having $9 ess=non Options note

  • 684 having $9 ess=nsm Manual note

  • 685 having $9 ess=ndp Discontinued partial

  • 685 having $9 ess=nrp Relocation

  • 689 having $9 ess=nru Sist brukt i…

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mc2skos-0.4.0.tar.gz (12.1 kB view details)

Uploaded Source

Built Distributions

mc2skos-0.4.0-py3.5.egg (20.2 kB view details)

Uploaded Source

mc2skos-0.4.0-py3.4.egg (20.3 kB view details)

Uploaded Source

mc2skos-0.4.0-py3.3.egg (20.6 kB view details)

Uploaded Source

mc2skos-0.4.0-py2.7.egg (20.1 kB view details)

Uploaded Source

mc2skos-0.4.0-py2.6.egg (20.2 kB view details)

Uploaded Source

File details

Details for the file mc2skos-0.4.0.tar.gz.

File metadata

  • Download URL: mc2skos-0.4.0.tar.gz
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for mc2skos-0.4.0.tar.gz
Algorithm Hash digest
SHA256 0b7859ac660a7af65683cf79140dc2f1e03aa419759876e640475b49b90ae941
MD5 f4df7cd4dc8b0191e3d0743a5f2dfe5c
BLAKE2b-256 159b087d0220a11c047825752002f0c4bc9eae5a759b6bd16d352608f37df2b9

See more details on using hashes here.

File details

Details for the file mc2skos-0.4.0-py3.5.egg.

File metadata

  • Download URL: mc2skos-0.4.0-py3.5.egg
  • Upload date:
  • Size: 20.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for mc2skos-0.4.0-py3.5.egg
Algorithm Hash digest
SHA256 323deb61a63ca67fbc47577e7e25e46548a6df0da2624d1ece539e2409703a2d
MD5 82186710094ac5e383fa1efb3c3e9eb1
BLAKE2b-256 1a0d9ac6c4608bb8c9aae9c0ec260fe40807bae6cb85413f53e46b7010441dfa

See more details on using hashes here.

File details

Details for the file mc2skos-0.4.0-py3.4.egg.

File metadata

  • Download URL: mc2skos-0.4.0-py3.4.egg
  • Upload date:
  • Size: 20.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for mc2skos-0.4.0-py3.4.egg
Algorithm Hash digest
SHA256 42f77342d58ac795aeeb022fbb3bc55cefce984ee92dc78cfbc995fb95ec1190
MD5 ab604dd88a5c2bfc895028653e5a971a
BLAKE2b-256 bf47b3e9ded7f0cd759e17fcf05b82cc8d002b27cfc71b510a9147fc2d64bbb0

See more details on using hashes here.

File details

Details for the file mc2skos-0.4.0-py3.3.egg.

File metadata

  • Download URL: mc2skos-0.4.0-py3.3.egg
  • Upload date:
  • Size: 20.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for mc2skos-0.4.0-py3.3.egg
Algorithm Hash digest
SHA256 ab5da86102ba57ecb4c1d221b48278293c3efccbc4b6aa7102fc47069a5d1fc1
MD5 5b2645a458dff79af9dd85c988f83e14
BLAKE2b-256 eb658161ddb35935968d18dfc8fcf3ff5867a92abca8c34065fffcb8726b39a4

See more details on using hashes here.

File details

Details for the file mc2skos-0.4.0-py2.7.egg.

File metadata

  • Download URL: mc2skos-0.4.0-py2.7.egg
  • Upload date:
  • Size: 20.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for mc2skos-0.4.0-py2.7.egg
Algorithm Hash digest
SHA256 10d5f9adf3ddea45d3d619e6d4c306f048295e5849ba827a6eaebfe48147ab86
MD5 873f3a0e4f0149f9a7731d2435938431
BLAKE2b-256 0271cfb9415e07dcdd87c8c28eae58cab613b846607596f248eca215c3481cef

See more details on using hashes here.

File details

Details for the file mc2skos-0.4.0-py2.6.egg.

File metadata

  • Download URL: mc2skos-0.4.0-py2.6.egg
  • Upload date:
  • Size: 20.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for mc2skos-0.4.0-py2.6.egg
Algorithm Hash digest
SHA256 937b6ce9ce47fd6c39d06f9928c84d807909a7ef320de661dd980b398bfac17e
MD5 a15232af64f747ae5808df64c1798e09
BLAKE2b-256 f388170c3bc066256d4f06a736ac9936343e84d250380e5f9a086252bccce87e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page