Skip to main content

LaminDB: Manage R&D data & analyses.

Project description

Stars codecov pypi

LaminDB: Data lakes for biology

LaminDB is an API layer for your existing infrastructure to manage your existing data & analyses.

Public beta: Currently only recommended for collaborators as we still make breaking changes.

Update 2023-06-05: We completed a major migration from SQLAlchemy/SQLModel to Django, available in pre-releases of v0.42.

Features

Free:

  • Track data lineage across notebooks, pipelines & apps.
  • Manage biological registries, ontologies & features.
  • Persist, load & stream data objects with a single line of code.
  • Query for anything & everything.
  • Define & manage your own schemas (assays, instruments, etc.).
  • Manage data on your laptop, on your server or in your cloud infra.
  • Use a mesh of distributed LaminDB instances for different teams and purposes.
  • Share instances through a Hub akin to GitHub.

Enterprise plan:

  • Explore & share data, submit samples & track lineage with LaminApp (deployable in your infra).
  • Receive support & services for a BioTech data & analytics platform.

How does it work?

LaminDB builds semantics of R&D and biology onto well-established tools:

  • SQLite & Postgres for SQL databases
  • S3, GCP & local storage for object storage
  • Django ORM (previously SQLAlchemy/SQLModel)
  • Configurable storage formats: pyarrow, anndata, zarr, etc.
  • Biological knowledge resources & ontologies: see Bionty

Most of LaminDB is open source.

Installation

pip install lamindb  # basic data lake
pip install 'lamindb[bionty]'  # biological entities
pip install 'lamindb[nbproject]'  # Jupyter notebook tracking
pip install 'lamindb[aws]'  # AWS dependencies (s3fs, etc.)
pip install 'lamindb[gcp]'  # GCP dependencies (gcfs, etc.)

Quick setup

Why do I have to sign up?

  • Data lineage requires a user identity (who modified which data when?).
  • Collaboration requires a user identity (who shares this with me?).

Signing up takes 1 min.

We do not store any of your data, but only basic metadata about you (email address, etc.) & your instances (S3 bucket names, etc.).

  • Sign up via lamin signup <email>.
  • Log in via lamin login <handle>.
  • Init an instance via lamin init --storage <storage>.

Usage overview

Track & query data lineage

ln.track()  # auto-detect a notebook & register as a Transform
ln.File("my_artifact.parquet").save()  # link Transform & Run objects to File object

Now, you can query, e.g., for

ln.File.select(created_by__handle="user1").df()   # a DataFrame of all files ingested by user1
ln.File.select().order_by("-updated_at").first()   # latest updated file

Or for

transforms = ln.Transform.select(  # all notebooks with 'T cell' in the title created in 2022
    name__contains="T cell", type="notebook", created_at__year=2022
).all()
ln.File.select(transform=transforms[1]).all()  # files ingested by the second notebook in transforms

Or, if you'd like to track a run of a registered pipeline (here, "Cell Ranger"):

transform = ln.Transform.select(name="Cell Ranger", version="0.7.1").one()  # select a pipeline from the registry
ln.track(transform)  # create a new global run context
ln.File("s3://my_samples01/my_artifact.fastq.gz").save()  # link file against run & transform

Now, you can query, e.g., for

run = ln.select(ln.Run, transform__name="Cell Ranger").order_by("-created_at").df()  # get the latest Cell Ranger pipeline runs
# query files by selected runs, etc.

Persist & load data objects

df = pd.DataFrame({"a": [1, 2], "b": [3, 4]})

ln.File(df, name="My dataframe").save()

Get it back:

file = ln.select(ln.File, name="My dataframe").one()  # query for it
df = file.load()  # load it into memory
    a   b
0   1   3
1   2   4

Manage biological registries

lamin init --storage ./myobjects --schema bionty

...

Track biological features

...

Track biological samples

...

Manage custom schemas

  1. Create a GitHub repository with Django ORMs similar to github.com/laminlabs/lnschema-lamin1
  2. Create & deploy migrations via lamin migrate create and lamin migrate deploy

It's fastest if we do this for you based on our templates within an enterprise plan, but you can fully manage the process yourself.

Notebooks

  • Find all guide notebooks here.
  • You can run these notebooks in hosted versions of JupyterLab, e.g., Saturn Cloud, Google Vertex AI, and others or on Google Colab.
  • Jupyter Lab & Notebook offer a fully interactive experience, VS Code & others require using the CLI (lamin track my-notebook.ipynb)

Architecture

LaminDB consists of the lamindb Python package, which builds on a number of open-source packages developed by Lamin:

  • bionty: Biological entities (usable standalone)
  • lamindb-setup: Setup & configure LaminDB, client for Lamin Hub.
  • lnschema-core: Core schema, containing the core ORMs.
  • lnschema-bionty: Bionty schema, containing ORMs that are coupled to Bionty's biological entities.
  • lnschema-lamin1: Exemplary configured schema to track samples, treatments, etc.
  • nbproject: Parse metadata from Jupyter notebooks.

LaminHub & LaminApp are not open sourced, neither are templates to model lab operations.

Documentation

Read the docs.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lamindb-0.42a7.tar.gz (180.8 kB view details)

Uploaded Source

Built Distribution

lamindb-0.42a7-py3-none-any.whl (48.1 kB view details)

Uploaded Python 3

File details

Details for the file lamindb-0.42a7.tar.gz.

File metadata

  • Download URL: lamindb-0.42a7.tar.gz
  • Upload date:
  • Size: 180.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.28.2

File hashes

Hashes for lamindb-0.42a7.tar.gz
Algorithm Hash digest
SHA256 c47b9058aaf98b27e376eb56f79e4f666b214e2d54553e0a9dc91c54d628419f
MD5 a5492f019d0732f5afa1a06f75ec93d2
BLAKE2b-256 fef4743e46472a31f337aa91ea2e34bf7d6bf30f3ec2fa7d87cc7a3992d8c9a5

See more details on using hashes here.

Provenance

File details

Details for the file lamindb-0.42a7-py3-none-any.whl.

File metadata

  • Download URL: lamindb-0.42a7-py3-none-any.whl
  • Upload date:
  • Size: 48.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.28.2

File hashes

Hashes for lamindb-0.42a7-py3-none-any.whl
Algorithm Hash digest
SHA256 7fa70850db33e8e8acc65b3656c46c90e74bae8bf3c052273f974cb1a7ddc9cd
MD5 a3a21a433fb82add7221edd95c1ec06f
BLAKE2b-256 45aaae85cc77aacf86f1052bd72f9c10e02bfe853c57ea74a8592cbccae6334f

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page