Skip to main content

No project description provided

Project description

L U N G - S A R G

The Open Data Platform for Sustainable, Accessible Lung Radiogenomics

GitHub GitHub Workflow Status GitHub Repo stars

Lung-SARG is a fully open-source and local-first platform that improves how communities collaborate on open data to diagnose lung cancer and perform epidemiology on local populations in low and middle income countries.

[!TIP] Datasets generated by this project are ready to explore and consume at HuggingFace.

Check them out!

💡 Principles

  • Open: Code, standards, infrastructure, and data, are public and open source.
  • Modular and Interoperable: Each component can be replaced, extended, or removed. Works well in many environments (your laptop, in a cluster, or from the browser), can be deployed to many places (S3 + GH Pages, IPFS, ...) and integrates with multiple tools (thanks to the Arrow and Zarr ecosystems). Use open tools, standards, infrastructure, and share data in accessible formats.
  • Data as Code: Declarative stateless transformations tracked in git. Improves data access and empowers data scientists to conduct research and helps to guide community-driven analysis and decisions. Version your data as code! Publish and share your reusable models for others to build on top. Datasets should be both reproducible and accessible!
  • Glue: Be a bridge between tools and approaches. E.g: Use software engineering good practices like types, tests, materialized views, and more.
  • FAIR.
  • KISS: Minimal and flexible. Rely on tools that do one thing and do it well.
  • No vendor lock-in
  • Distributed: Permissionless ecosystem and collaboration. Open source code and make it ready to be improved.
  • Community: that incentives contributors.
  • Immutability: Embrace idempotency. Rely on content-addressable storage and append-only logs.
  • Stateless and serverless: as much as possible. E.g. use GitHub Pages, host datasets on S3, interface with HTML, JavaScript, and WASM. No servers to maintain, no databases to manage, no infrastructure to worry about. Keep infrastructure management lean.
  • Offline-first: Rely on static files and offline-first tools.
  • Above all, have fun and enjoy the process 🎉

Overview

Lung SARG dataflow

Lung SARG dataflow.

⚙️ Setup and execution

🐍 Pixi

You can install all the dependencies inside a reproducible software environment via pixi. To do that, install pixi, clone the repository, and run the following command from the root folder.

pixi install -a

To see all tasks available:

pixi task list

Start and access the Dagster UI locally.

pixi run dev

🧬 Run on sample data

In the Dagster UI, click

Overview -> Jobs -> stage_idc_nsclc_radiogenomic_samples -> Materialize all

Materialize staging of samples

Observe what happens in the Overview, Runs, and Assets pages of the Dagster UI, and the content in the lung-sarg/data directory.

🎯 Motivation

This project started after thinking about what an Open Data Protocol could look like!

👏 Acknowledgements

  • This project was built on the principles espoused by David Gasquez at Datonic. It is built on the approach in the Datadex Open Data Platform and extended for scientific imaging data with OME-Zarr and the DICOM-based image data model in the NIH Imaging Data Commons.
  • Lung-SARG is possible thanks to amazing open source projects like DuckDB, dbt, Dagster, ITK and many others...
  • This project was built with support from Dr. James Gee in collaboration with the UPenn PICSL Lab.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lung_sarg-1.0.0.tar.gz (11.8 kB view details)

Uploaded Source

Built Distribution

lung_sarg-1.0.0-py3-none-any.whl (9.8 kB view details)

Uploaded Python 3

File details

Details for the file lung_sarg-1.0.0.tar.gz.

File metadata

  • Download URL: lung_sarg-1.0.0.tar.gz
  • Upload date:
  • Size: 11.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for lung_sarg-1.0.0.tar.gz
Algorithm Hash digest
SHA256 b5bec0f6491654d4cd05ffb8450c9c4162947d07d62aa10b7e598a17c0f63542
MD5 a0b8ccc2c2b9257cd3e143eb28f5448d
BLAKE2b-256 099632bc8386f73c633934e9bde686ec571ff35599a5e6a41a2d3136ed05b854

See more details on using hashes here.

File details

Details for the file lung_sarg-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: lung_sarg-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 9.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for lung_sarg-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 32391fc670edad6b3e68e803a55f78f7df1ddd0565c6b66b032293afa90d6473
MD5 59000e04702c6d2b185f1fd00db5b929
BLAKE2b-256 df5548697fb14b3676d24f2dedfa0f3af8a7bbe38b8da99eaebeda0e42c3ba13

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page