Skip to main content

GA4GH Data Object Service Schemas

Project description

travis status

travis status

Schemas for the Data Object Service (DOS) API

View the schemas in Swagger UI

The Global Alliance for Genomics and Health is an international coalition, formed to enable the sharing of genomic and clinical data. This collaborative consortium takes place primarily via github and public meetings. Join the issues today to help us make a cloud agnostic Data Object Service!

Cloud Workstream

The Data Working Group concentrates on data representation, storage, and analysis, including working with platform development partners and industry leaders to develop standards that will facilitate interoperability. The Cloud Workstream is an informal, multi-vendor working group focused on standards for exchanging Docker-based tools and CWL/WDL workflows, execution of Docker-based tools and workflows on clouds, and abstract access to cloud object stores.

What is DOS?

Currently, this is the home of the Data Object Service (DOS) API proposal. This repo has a CWL-based build process ready to go and a place for us to collectively work on USECASES.md.

This proposal for a DOS release is based on the schema work of Brian W. and others from OHSU along with work by UCSC. It also is informed by existing object storage systems such as:

The goal of DOS is to create a generic API on top of these and other projects, so workflow systems can access data in the same way regardless of project. One section of the API focuses on how to read and write data objects to cloud environments and how to join them together as data bundles (Data object management). Another focuses on the ability to find data objects across cloud environments and implementations of DOS (Data object queries). The latter is likely to be worked on in conjunction with the GA4GH Discovery Workstream.

Key features of the current API proposal:

Data object management

This section of the API focuses on how to read and write data objects to cloud environments and how to join them together as data bundles. Data bundles are simply a flat collection of one or more files. This section of the API enables:

  • create/update/delete a file

  • create/update/delete a data bundle

  • register UUIDs with these entities (an optionally track versions of each)

  • generate signed URLs and/or cloud specific object storage paths and temporary credentials

Data object queries

A key feature of this API beyond creating/modifying/deletion files is the ability to find data objects across cloud environments and implementations of DOS. This section of the API allows users to query by data bundle or file UUIDs which returns information about where these data objects are available. This response will typically be used to find the same file or data bundle located across multiple cloud environments.

Implementations

There are currently a few experimental implementations that use some version of these schemas.

  • DOS Connect observes cloud and local storage systems and broadcasts their changes to a service that presents DOS endpoints.

  • DOS Downloader is a mechanism for downloading Data Objects from DOS URLs.

  • dos-gdc-lambda presents data from the GDC public rest API using the Data Object Service.

  • dos-signpost-lambda presents data from a signpost instance using the Data Object Service.

Building the client and server

You can use pip to install a python client and server that implements these schemas.

virtualenv env
source env/bin/activate
pip install git+git://github.com/ga4gh/data-object-service-schemas@master --process-dependency-links

This will add the python modules ga4gh.dos.server and ga4gh.dos.client you can use in your projects.

There is also a CLI hook.

ga4gh_dos_server
# In another terminal
ga4gh_dos_demo

Building Documents

The schemas are editable as OpenAPI 2 YAML files. To generate OpenAPI 3 descriptions install swagger2openapi and run the following:

swagger2openapi -y openapi/data_object_service.swagger.yaml > openapi/data_object_service.openapi.yaml

How to contribute changes

Take cues for now from the ga4gh/schemas document.

License

See the LICENSE

More Information

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ga4gh_dos_schemas-0.2.0.tar.gz (48.8 kB view details)

Uploaded Source

Built Distribution

ga4gh_dos_schemas-0.2.0-py2-none-any.whl (19.8 kB view details)

Uploaded Python 2

File details

Details for the file ga4gh_dos_schemas-0.2.0.tar.gz.

File metadata

File hashes

Hashes for ga4gh_dos_schemas-0.2.0.tar.gz
Algorithm Hash digest
SHA256 6c280be5e0126710cc4b0252e98943b87c46db46432937fa31700651a48ac659
MD5 df8ba03f70d0962942727f04ae714f1f
BLAKE2b-256 e96c5dc4445e904095b92fff30c4ba0a68b40267b0a4d1baa87371bd5847cb2a

See more details on using hashes here.

File details

Details for the file ga4gh_dos_schemas-0.2.0-py2-none-any.whl.

File metadata

File hashes

Hashes for ga4gh_dos_schemas-0.2.0-py2-none-any.whl
Algorithm Hash digest
SHA256 67c0dc3c99dbde961f1407a887107c748bca1bf27e4d007e6b2e1b7f4a925198
MD5 7c222a3be7ddd00df22f9c4d9f70ebd7
BLAKE2b-256 9d2545604231df6b25c9a75f7e0becbe3a847b0a79c6e425ac7c8847b54ba095

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page