Convert Marc21 Classification records in MARC/XML to SKOS/RDF
Project description
Python script for converting MARC 21 Classification records (serialized as MARCXML) to SKOS concepts.
Developed to support the project “Felles terminologi for klassifikasjon med Dewey”, for converting Dewey Decimal Classification (DDC) records. Issues and suggestions for generalizations and improvements are welcome!
Installation
Releases can be installed from the command line with pip:
$ pip install --upgrade mc2skos # with virtualenv or as root
$ pip install --upgrade --user mc2skos # install to ~/.local
Works with both Python 2.x and 3.x. See Travis for details on tested Python versions.
If lxml fails to install on Windows, try the windows installer from from PyPI.
If lxml fails to install on Unix, install system packages python-dev and libxml2-dev
Make sure the Python scripts folder has been added to your PATH.
To directly use a version from source code repository:
$ git clone https://github.com/scriptotek/mc2skos.git
$ cd mc2skos
$ pip install -e .
Usage
mc2skos infile.xml outfile.ttl # from file to file
mc2skos infile.xml > outfile.ttl # from file to standard output
Run mc2skos --help or mc2skos -h for options.
URIs
Concept URIs are generated from an URI template specified with option --uri. The following template parameters are recognized:
{collection} is “class”, “table” or “scheme”
{object} is a member of the classification scheme and part of a {collection}, such as a specific class or table.
{edition} is taken from 084 $c (with language code stripped)
The following default URI template are used for known concept scheme identifiers in 084 $a:
ddc: http://dewey.info/{collection}/{object}/e{edition}/ (DDC)
bkl: http://uri.gbv.de/terminology/bk/{object} (Basisklassifikation)
To add skos:inScheme statements to all records, an URI template must be specified with option --scheme or it is derived from a known default template.
To add an additional skos:inScheme statement to table records, an URI template must be specified with option --table_scheme or it is derived from a known default template.
The following example is generated from a DDC table record:
<http://dewey.info/class/6--982/e21/> a skos:Concept ;
skos:inScheme <http://dewey.info/scheme/edition/e21/>,
<http://dewey.info/table/6/e21/> ;
skos:notation "T6--982" ;
skos:prefLabel "Chibchan and Paezan languages"@en .
Mapping schema
Only a small part of the MARC21 Classification data model is converted, and the conversion follows a rather pragmatic approach, exemplified by the mapping of the 7XX fields to skos:altLabel.
MARC21XML |
RDF |
---|---|
153 $a, $c, $z Classification number |
skos:notation |
153 $j Caption |
skos:prefLabel |
153 $e, $f, $z Classification number hierarchy |
skos:broader |
253 Complex See Reference |
skos:editorialNote |
353 Complex See Also Reference |
skos:editorialNote |
680 Scope Note |
skos:scopeNote |
683 Application Instruction Note |
skos:editorialNote |
685 History Note |
skos:historyNote |
694 ??? Note |
skos:editorialNote |
700 Index Term-Personal Name |
skos:altLabel |
710 Index Term-Corporate Name |
skos:altLabel |
711 Index Term-Meeting Name |
skos:altLabel |
730 Index Term-Uniform Title |
skos:altLabel |
748 Index Term-Chronological |
skos:altLabel |
750 Index Term-Topical |
skos:altLabel |
751 Index Term-Geographic Name |
skos:altLabel |
753 Index Term-Uncontrolled |
skos:altLabel |
765 Synthesized Number Components |
mads:componentList (see below) |
Synthesized number components
Components of synthesized numbers explicitly described in 765 fields are expressed using the mads:componentList property, and to preserve the order of the components, we use RDF lists. Example:
@prefix mads: <http://www.loc.gov/mads/rdf/v1#> .
<http://dewey.info/class/001.30973/e23/> a skos:Concept ;
mads:componentList (
<http://dewey.info/class/001.3/e23/>
<http://dewey.info/class/1--09/e23/>
<http://dewey.info/class/2--73/e23/>
) ;
skos:notation "001.30973" .
Retrieving list members in order is surprisingly hard with SPARQL. Retrieving ordered pairs is the best solution I’ve come up with so far:
PREFIX mads: <http://www.loc.gov/mads/rdf/v1#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT ?c1_notation ?c1_label ?c2_notation ?c2_label
WHERE { GRAPH <http://localhost/ddc23no> {
<http://dewey.info/class/001.30973/e23/> mads:componentList ?l .
?l rdf:rest* ?sl .
?sl rdf:first ?e1 .
?sl rdf:rest ?sln .
?sln rdf:first ?e2 .
?e1 skos:notation ?c1_notation .
?e2 skos:notation ?c2_notation .
OPTIONAL {
?e1 skos:prefLabel ?c1_label .
}
OPTIONAL {
?e2 skos:prefLabel ?c2_label .
}
}}
c1_notation |
c1_label |
c2_notation |
c2_label |
---|---|---|---|
“001.3” |
“Humaniora”@nb |
“T1–09” |
“Historie, geografisk behandling, biografier”@nb |
“T1–09” |
“Historie, geografisk behandling, biografier”@nb |
“T2–73” |
“USA”@nb |
Additional processing for data from WebDewey
The script is supposed to work with any MARC21 classification data, but also supports the non-standard ess codes supplied in WebDewey data to differentiate between different types of notes.
MARC21XML |
RDF |
---|---|
680 having $9 ess=ndf Definition note |
skos:definition |
680 having $9 ess=nvn Variant name note |
wd:variantName for each subfield $t |
680 having $9 ess=nch Class here note |
wd:classHere for each subfield $t |
680 having $9 ess=nin Including note |
wd:including for each subfield $t |
680 having $9 ess=nph Former heading |
wd:formerHeading for each subfield $t |
685 having $9 ess=ndn Deprecation note |
owl:deprecated true |
694 having $9 ess=nml ??? |
SKOS.editorialNote |
Notes that are currently not treated in any special way:
253 having $9 ess=nsx Do-not-use.
253 having $9 ess=nce Class-elsewhere
253 having $9 ess=ncw Class-elsewhere-manual
253 having $9 ess=nse See.
253 having $9 ess=nsw See-manual.
353 having $9 ess=nsa See-also
683 having $9 ess=nbu Preference note
683 having $9 ess=nop Options note
683 having $9 ess=non Options note
684 having $9 ess=nsm Manual note
685 having $9 ess=ndp Discontinued partial
685 having $9 ess=nrp Relocation
689 having $9 ess=nru Sist brukt i…
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file mc2skos-0.5.1.tar.gz
.
File metadata
- Download URL: mc2skos-0.5.1.tar.gz
- Upload date:
- Size: 13.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fac143c90a13fc09ae72783b8560531309dd0d1e6e9bb10fb1b00f1758733caa |
|
MD5 | 3e23c18b19fe4c920b1e48fc2da0ebc3 |
|
BLAKE2b-256 | 3ade18fa790b1c96990566cfbbe9395dd2786c8262a17f377d7138360c011488 |
File details
Details for the file mc2skos-0.5.1-py3.5.egg
.
File metadata
- Download URL: mc2skos-0.5.1-py3.5.egg
- Upload date:
- Size: 22.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 639cd7c3948fafccdd65592dade74740fd26f445e17c4653a8d662da29973081 |
|
MD5 | 5e303792b524caa33670774d74a75df4 |
|
BLAKE2b-256 | 5d830c51461412ceee674b742e703caae252371f1312607d44182306c0f7fb42 |
File details
Details for the file mc2skos-0.5.1-py3.4.egg
.
File metadata
- Download URL: mc2skos-0.5.1-py3.4.egg
- Upload date:
- Size: 22.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e5f5598f62c3edd3afbb7c31d04e1c20cec774434893f5a97a39012caa94a75e |
|
MD5 | f77dc317909edd705fe58875a67dae73 |
|
BLAKE2b-256 | b9cedd2e416afcd95c9c4fdb03c7e45f11e4a1a7a250c5631fff8aac980f3613 |
File details
Details for the file mc2skos-0.5.1-py3.3.egg
.
File metadata
- Download URL: mc2skos-0.5.1-py3.3.egg
- Upload date:
- Size: 22.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 893cc47c884cf65b78085d18cfad45510a7654c4478e9b37d996aeb13f788c99 |
|
MD5 | fa6b71300d4dc54fe2116ec1d869aac4 |
|
BLAKE2b-256 | fa4b07340987f6915d6a3f253fd037e3151d148c2611c9fd32ac04bc3b2cf74a |
File details
Details for the file mc2skos-0.5.1-py2.7.egg
.
File metadata
- Download URL: mc2skos-0.5.1-py2.7.egg
- Upload date:
- Size: 22.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 37c1ceaca96c16589fe75548fe2a2ac6e8e5049ec959dd66007ea2e78f767a0e |
|
MD5 | 026825284916c64dc3589880de839652 |
|
BLAKE2b-256 | 7bbf0fcce652596a63590acbe868d56759069199aa304bd526bee5154902e31f |
File details
Details for the file mc2skos-0.5.1-py2.6.egg
.
File metadata
- Download URL: mc2skos-0.5.1-py2.6.egg
- Upload date:
- Size: 22.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3dcca52608a775512f1a50bf54575a862902fd414269ed5837f4c890ff0656a5 |
|
MD5 | de0b5fedc2b724e10f0636e655179c0f |
|
BLAKE2b-256 | abd90f138afeeba07cd533a61bd57a4c2dfa05907175f29e77f9206385e4311e |