Skip to main content

Ordered Turtle Serializer for rdflib

Project description

Build status Code health Latest version MIT license

Ordered Turtle Serializer for rdflib

An extension to the rdflib Turtle serializer that adds order (at the price of speed). Useful when you need to generate diffs between Turtle files, or just to make it easier for human beings to inspect the files.

$ pip install otsrdflib

Usage:

from rdflib import graph
from otsrdflib import OrderedTurtleSerializer

my_graph = Graph()

out = open('out.ttl', 'wb')
serializer = OrderedTurtleSerializer(my_graph)
serializer.serialize(out)

Class order is imposed by setting serializer.topClasses. The default list is suitable for thesauri and other controlled vocabularies:

serializer.topClasses = [SKOS.ConceptScheme,
                       FOAF.Organization,
                       SD.Service,
                       SD.Dataset,
                       SD.Graph,
                       SD.NamedGraph,
                       ISOTHES.ThesaurusArray,
                       SKOS.Concept]

Instance order (within a class) is imposed using the Python cmp method. By default, URIs are sorted alphabetically as-is, but with serializer.sorters you can generate your own sort key based on URI patterns. By default, if a URI ends with a number, that number is used as a numerical sort key:

serializer.sorters = {
  '.*?([0-9]+)$': lambda x: int(x[0])
}

Here x refers to the match object groups. Note that index 0 refers to the first group, not the entire match!

With this sorter, http://…/…/99 will be arranged before http://…/…/100 since the sort keys are the integers 99 and 100. Note that if you have number-ending URIs ending with different bases, these will be mangled together. One simple way to group together URIs with the same base could be to use a “large” number that represents the base, for instance a 8-digit hash:

def xhash(s):
    return int(hashlib.sha1(x[0]).hexdigest(), 16) % 10**8

serializer.sorters = {
  '(.*?)([0-9]+)$': lambda x: xhash(x[0]) + int(x[1])
}

For a slightly more complicated example, we have a look at Dewey URIs. For a typical URI like http://dewey.info/class/001.433/e23/, we would like to use the decimal number 1.433 as the sort key. We can achieve that by configuring a sorter like so:

serializer.sorters = {
  'http://dewey.info/class/([0-9.]+)': lambda x: float(x[0])
}

But then there’s also table numbers like http://dewey.info/class/T1–0901/e23/. We want to have the tables T1, T2, … follow the main schedules. Since the main schedules go from 0 to 999.99… we can map the tables T1…T6 to some larger integers, like 1001…1006. Noting that the table numbers like 0901 represents a fractional part, the sort key for T1–0901 becomes 1001.0901. Such keys can be generated by adding another sorter:

serializer.sorters = {
  'http://dewey.info/class/([0-9.]+)': lambda x: float(x[0]),
  'http://dewey.info/class/T([0-9])\-\-([0-9]+)': lambda x: 1000. + int(x[0]) + float('.' + x[1])
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

otsrdflib-0.2.3.tar.gz (3.9 kB view details)

Uploaded Source

File details

Details for the file otsrdflib-0.2.3.tar.gz.

File metadata

  • Download URL: otsrdflib-0.2.3.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for otsrdflib-0.2.3.tar.gz
Algorithm Hash digest
SHA256 c12f72ea93b0c40d7898e0c31836ef209117ef75ee7a07217cdbbb0bd262d4db
MD5 ba5eef9095489ba8a1382c2051de69ce
BLAKE2b-256 3a1a60abe26055695e79c560453aa96c6a4ff29b7c31c8888351e3af4dbf148b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page