A library and reference implementation for merging OCDS releases
Project description
This Python package helps to create records that conform to the Open Contracting Data Standard. Specifically, it provides functions for merging OCDS releases with the same OCID into either a compiled release or a versioned release, as described in the OCDS documentation.
pip install ocdsmerge
Usage
The two main functions are merge
and merge_versioned
. They take OCDS releases as input, and return a compiled release or a versioned release as output, respectively. For example:
import ocdsmerge
# In a real-world example, the OCDS releases might be loaded from JSON files.
releases = [
{
"ocid": "ocds-213czf-A",
"id": "1",
"date": "2014-01-01",
"tag": ["tender"],
"initiationType": "tender",
"tender": {
"id": "A",
"procurementMethod": "selective"
}
},
{
"ocid": "ocds-213czf-A",
"id": "2",
"date": "2014-01-02",
"tag": ["tender"],
"initiationType": "tender",
"tender": {
"id": "A",
"procurementMethod": "open"
}
}
]
compiledRelease = ocdsmerge.merge(releases)
versionedRelease = ocdsmerge.merge_versioned(releases)
You can then create an OCDS record using the compiledRelease
and versionedRelease
.
Important caveats
You must ensure that the OCDS releases that you provide as input have the same OCID.
If you are using an older version of the OCDS release schema, you must specify the older schema as a URL, file path, or Python dictionary (see below).
If you are using OCDS extensions, you should patch the OCDS release schema (for instance, using json-merge-patch) and specify the patched schema as a URL, file path, or Python dictionary.
Using different release schema
By default, the merge
and merge_versioned
functions use the latest version of the OCDS release schema, which they download once. However, you may want to use an older version, an extended schema, or a local schema to avoid remote requests. To do so, use the optional schema
argument, which can be:
A URL to a release schema, as a string starting with http
A file path to a release schema, as a string
a release schema, as a Python dictionary
# URL
ocdsmerge.merge(releases, schema='https://standard.open-contracting.org/schema/1__0__3/release-schema.json')
# Relative file path
ocdsmerge.merge(releases, schema='release-schema.json')
# Absolute file path
ocdsmerge.merge(releases, schema='/absolute/path/to/release-schema.json')
# A Python dictionary, stored in a `release_schema` variable
ocdsmerge.merge(releases, schema=release_schema)
Using cached merge rules
The merge
and merge_versioned
functions extract merge rules from the release schema. If the release schema were provided as a string (i.e. as a URL or file path), then these merge rules are automatically cached between function calls. However, if it were provided as a Python dictionary, then they won’t be cached. To manually cache merge rules, use the get_merge_rules
function:
merge_rules = ocdsmerge.get_merge_rules('release-schema.json')
ocdsmerge.merge(releases, merge_rules=merge_rules)
Working with degenerate data
The merge routine merges multiple individual releases into either a compiled release or a versioned release. Across the individual releases, it merges objects in arrays based on their id values, as described in the OCDS documentation. This allows a publisher to, for example, disclose an upcoming milestone in one release, and set the date on which it was met in another release.
However, if objects that correspond to different things re-use id values, then only the last object is retained in the merged release, by default. (To be clear, such data has structural errors.) For example, if a publisher creates a release for each award notice in a procurement procedure, and restarts the numbering of award objects in each release from ‘1’, then the later releases will overwrite the award objects of the earlier releases.
Similarly, if, in a single release, objects in the same array share an id value, then only the last object is retained.
If, in a single release, objects in the same array share an id value, the merge
and merge_versioned
functions issue a DuplicateIdValueWarning
warning. You can turn the warning into an exception or ignore the warning using a warning filter. For example:
import warnings
import ocdsmerge
from ocdsmerge.merge import DuplicateIdValueWarning
# Raise an error, instead.
with warnings.catch_warnings():
warnings.filterwarnings('error', category=DuplicateIdValueWarning)
ocdsmerge.merge(releases)
# Ignore the warning, instead.
with warnings.catch_warnings():
warnings.filterwarnings('ignore', category=DuplicateIdValueWarning)
ocdsmerge.merge(releases)
If you know in advance that the individual releases have structure errors as described above, you can change the behavior of the merge routine by setting the rule_overrides
argument on a per-field basis:
ocdsmerge.MERGE_BY_POSITION
: merge objects in the given array based on their array index, instead of their id value.This is appropriate if the publisher always re-publishes all prior objects in a given array, and puts them in a consistent order.
ocdsmerge.APPEND
: retain all objects in the given array, instead of merging any.This is appropriate if the publisher never updates or re-publishes a prior object in a given array.
The field paths are specified as tuples. For example:
ocdsmerge.merge(releases, rule_overrides={
('awards',): ocdsmerge.APPEND,
('contracts', 'implementation', 'milestones'): ocdsmerge.MERGE_BY_POSITION,
})
Reference implementation
This package serves as a reference implementation of OCDS merging. You can read its commented code in merge.py.
Test cases
We provide test cases for other implementations of OCDS merging under the tests/fixtures directory. The 1.0 and 1.1 directories contain files like simple.json, which contain a list of OCDS releases as JSON; the suffixed simple-compiled.json and simple-versioned.json files contain the expected compiled release and versioned release respectively. To test your implementation, provide as input a file like simple.json as well as the appropriate version of the OCDS release schema, and compare your output to files like simple-compiled.json and simple-versioned.json.
To prepare your implementation for future versions and third-party extensions, you can test your implementation using the files under the schema directory and using the schema in the schema.json file.
In future, we can consider providing a more formal test suite, like those for CSV on the Web. Please contact data@open-contracting.org if interested.
Copyright (c) 2015 Open Contracting Partnership, released under the BSD license
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file ocdsmerge-0.5.12.tar.gz
.
File metadata
- Download URL: ocdsmerge-0.5.12.tar.gz
- Upload date:
- Size: 84.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: Python-urllib/3.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d61492d2ef828f79ea362a32a0aa0193dc26724fafafaf45e53194435b564cff |
|
MD5 | b5407fc116482407b83dfe0a0ce695ad |
|
BLAKE2b-256 | 0ce57693178a03a1a0d38bae62e4d7cb7689733265baa9f1fab70c369d1b9814 |