Skip to main content

Convert Marc21 Classification records in MARC/XML to SKOS/RDF

Project description

Build status Test coverage Code health Latest version MIT license

Python script for converting MARC 21 Classification records (serialized as MARCXML) to SKOS concepts.

Developed to support the project “Felles terminologi for klassifikasjon med Dewey”, for converting Dewey Decimal Classification (DDC) records. Issues and suggestions for generalizations and improvements are welcome!

Installation

Releases can be installed from the command line with pip:

$ pip install --upgrade mc2skos             # with virtualenv or as root
$ pip install --upgrade --user mc2skos      # install to ~/.local
  • Works with both Python 2.x and 3.x. See Travis for details on tested Python versions.

  • If lxml fails to install on Windows, try the windows installer from from PyPI.

  • If lxml fails to install on Unix, install system packages python-dev and libxml2-dev

  • Make sure the Python scripts folder has been added to your PATH.

To directly use a version from source code repository:

$ git clone https://github.com/scriptotek/mc2skos.git
$ cd mc2skos
$ pip install -e .

Usage

mc2skos infile.xml outfile.ttl      # from file to file
mc2skos infile.xml > outfile.ttl    # from file to standard output

Run mc2skos --help or mc2skos -h for options.

URIs

Concept URIs are generated from an URI template specified with option --uri. The following template parameters are recognized:

  • {collection} is “class”, “table” or “scheme”

  • {object} is a member of the classification scheme and part of a {collection}, such as a specific class or table.

  • {edition} is taken from 084 $c (with language code stripped)

The following default URI template are used for known concept scheme identifiers in 084 $a:

  • ddc: http://dewey.info/{collection}/{object}/e{edition}/ (DDC)

  • bkl: http://uri.gbv.de/terminology/bk/{object} (Basisklassifikation)

To add skos:inScheme statements to all records, an URI template must be specified with option --scheme or it is derived from a known default template.

To add an additional skos:inScheme statement to table records, an URI template must be specified with option --table_scheme or it is derived from a known default template.

The following example is generated from a DDC table record:

<http://dewey.info/class/6--982/e21/> a skos:Concept ;
    skos:inScheme <http://dewey.info/scheme/edition/e21/>,
                  <http://dewey.info/table/6/e21/> ;
    skos:notation "T6--982" ;
    skos:prefLabel "Chibchan and Paezan languages"@en .

Mapping schema

Only a small part of the MARC21 Classification data model is converted, and the conversion follows a rather pragmatic approach, exemplified by the mapping of the 7XX fields to skos:altLabel.

MARC21XML

RDF

153 $a, $c, $z Classification number

skos:notation

153 $j Caption

skos:prefLabel

153 $e, $f, $z Classification number hierarchy

skos:broader

253 Complex See Reference

skos:editorialNote

353 Complex See Also Reference

skos:editorialNote

680 Scope Note

skos:scopeNote

683 Application Instruction Note

skos:editorialNote

685 History Note

skos:historyNote

694 ??? Note

skos:editorialNote

700 Index Term-Personal Name

skos:altLabel

710 Index Term-Corporate Name

skos:altLabel

711 Index Term-Meeting Name

skos:altLabel

730 Index Term-Uniform Title

skos:altLabel

748 Index Term-Chronological

skos:altLabel

750 Index Term-Topical

skos:altLabel

751 Index Term-Geographic Name

skos:altLabel

753 Index Term-Uncontrolled

skos:altLabel

765 Synthesized Number Components

mads:componentList (see below)

Synthesized number components

Components of synthesized numbers explicitly described in 765 fields are expressed using the mads:componentList property, and to preserve the order of the components, we use RDF lists. Example:

@prefix mads: <http://www.loc.gov/mads/rdf/v1#> .

<http://dewey.info/class/001.30973/e23/> a skos:Concept ;
    mads:componentList (
        <http://dewey.info/class/001.3/e23/>
        <http://dewey.info/class/1--09/e23/>
        <http://dewey.info/class/2--73/e23/>
    ) ;
    skos:notation "001.30973" .

Retrieving list members in order is surprisingly hard with SPARQL. Retrieving ordered pairs is the best solution I’ve come up with so far:

PREFIX mads: <http://www.loc.gov/mads/rdf/v1#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT ?c1_notation ?c1_label ?c2_notation ?c2_label
WHERE { GRAPH <http://localhost/ddc23no> {

    <http://dewey.info/class/001.30973/e23/> mads:componentList ?l .
        ?l rdf:rest* ?sl .
        ?sl rdf:first ?e1 .
        ?sl rdf:rest ?sln .
        ?sln rdf:first ?e2 .

        ?e1 skos:notation ?c1_notation .
        ?e2 skos:notation ?c2_notation .

        OPTIONAL {
            ?e1 skos:prefLabel ?c1_label .
        }
        OPTIONAL {
            ?e2 skos:prefLabel ?c2_label .
        }
}}

c1_notation

c1_label

c2_notation

c2_label

“001.3”

“Humaniora”@nb

“T1–09”

“Historie, geografisk behandling, biografier”@nb

“T1–09”

“Historie, geografisk behandling, biografier”@nb

“T2–73”

“USA”@nb

Additional processing for data from WebDewey

The script is supposed to work with any MARC21 classification data, but also supports the non-standard ess codes supplied in WebDewey data to differentiate between different types of notes.

MARC21XML

RDF

680 having $9 ess=ndf Definition note

skos:definition

680 having $9 ess=nvn Variant name note

wd:variantName for each subfield $t

680 having $9 ess=nch Class here note

wd:classHere for each subfield $t

680 having $9 ess=nin Including note

wd:including for each subfield $t

680 having $9 ess=nph Former heading

wd:formerHeading for each subfield $t

685 having $9 ess=ndn Deprecation note

owl:deprecated true

694 having $9 ess=nml ???

SKOS.editorialNote

Notes that are currently not treated in any special way:

  • 253 having $9 ess=nsx Do-not-use.

  • 253 having $9 ess=nce Class-elsewhere

  • 253 having $9 ess=ncw Class-elsewhere-manual

  • 253 having $9 ess=nse See.

  • 253 having $9 ess=nsw See-manual.

  • 353 having $9 ess=nsa See-also

  • 683 having $9 ess=nbu Preference note

  • 683 having $9 ess=nop Options note

  • 683 having $9 ess=non Options note

  • 684 having $9 ess=nsm Manual note

  • 685 having $9 ess=ndp Discontinued partial

  • 685 having $9 ess=nrp Relocation

  • 689 having $9 ess=nru Sist brukt i…

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mc2skos-0.5.0.tar.gz (12.7 kB view details)

Uploaded Source

Built Distributions

mc2skos-0.5.0-py3.5.egg (21.2 kB view details)

Uploaded Source

mc2skos-0.5.0-py3.4.egg (21.2 kB view details)

Uploaded Source

mc2skos-0.5.0-py3.3.egg (21.5 kB view details)

Uploaded Source

mc2skos-0.5.0-py2.7.egg (21.1 kB view details)

Uploaded Source

mc2skos-0.5.0-py2.6.egg (21.1 kB view details)

Uploaded Source

File details

Details for the file mc2skos-0.5.0.tar.gz.

File metadata

  • Download URL: mc2skos-0.5.0.tar.gz
  • Upload date:
  • Size: 12.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for mc2skos-0.5.0.tar.gz
Algorithm Hash digest
SHA256 dccdd4a5212c8402a53f7f7ced296d016c0a239ae72cb5bc35702f1e55300333
MD5 fb426e56aedc446973429c322696505b
BLAKE2b-256 2aa18e4dfde3ac3651f28ab11e728836ad0fcce367f4cf5810fcd8b8aee37b30

See more details on using hashes here.

File details

Details for the file mc2skos-0.5.0-py3.5.egg.

File metadata

  • Download URL: mc2skos-0.5.0-py3.5.egg
  • Upload date:
  • Size: 21.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for mc2skos-0.5.0-py3.5.egg
Algorithm Hash digest
SHA256 8c42720fa03a1fbfff8c48e532753cc5a291ae5e1d272839636bfd896d9de546
MD5 2e7dcc513e31624f968cea3aca508e71
BLAKE2b-256 0e8d0990df7c3ba2cd39dda5bfbb7785839164a643be0e489ea6b43f75dbe7b3

See more details on using hashes here.

File details

Details for the file mc2skos-0.5.0-py3.4.egg.

File metadata

  • Download URL: mc2skos-0.5.0-py3.4.egg
  • Upload date:
  • Size: 21.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for mc2skos-0.5.0-py3.4.egg
Algorithm Hash digest
SHA256 2ccf14758f5c55e32c44f7b91894483beca2397d3ce0362df2f5ad593c7e9e83
MD5 04c64b5ff6b7a5d174a9576822a831e4
BLAKE2b-256 46309067d47fd6dfde7a05e52a6bfbb2d330acb259c131edbaaa40a074c08210

See more details on using hashes here.

File details

Details for the file mc2skos-0.5.0-py3.3.egg.

File metadata

  • Download URL: mc2skos-0.5.0-py3.3.egg
  • Upload date:
  • Size: 21.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for mc2skos-0.5.0-py3.3.egg
Algorithm Hash digest
SHA256 a31f24646026b3fb8cdb2756e35f2aea470ec5e25084eaddc3f979f4d6f0dd31
MD5 7013cda03235c0cf58313010acc15d01
BLAKE2b-256 c420f7ddb08de11ab7756b7bdab0a7f462e8a3519b5249d346078ff5b804799a

See more details on using hashes here.

File details

Details for the file mc2skos-0.5.0-py2.7.egg.

File metadata

  • Download URL: mc2skos-0.5.0-py2.7.egg
  • Upload date:
  • Size: 21.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for mc2skos-0.5.0-py2.7.egg
Algorithm Hash digest
SHA256 70c5a01cf3f4aa79364c7467f928b5cdbc300c1efae72e5783bf4a6932a50a8c
MD5 5df95c0f9a8c53c1ac24292bd0de11fa
BLAKE2b-256 2f51db63108118296aa31ac1ea801c3604cc1762b53238807e5f477ddb68a611

See more details on using hashes here.

File details

Details for the file mc2skos-0.5.0-py2.6.egg.

File metadata

  • Download URL: mc2skos-0.5.0-py2.6.egg
  • Upload date:
  • Size: 21.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for mc2skos-0.5.0-py2.6.egg
Algorithm Hash digest
SHA256 574d6cbc8bad168002d599ea2968e4e09382f96c9e33deafa93a098ba2755f56
MD5 688e424029e1dc3978b1e5545373d834
BLAKE2b-256 6b8f4f03667b59a0219d18e1b9cc49c625b26d5a7e53cbb88002f59f7a842918

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page