Skip to main content

LaminDB: Manage R&D data & analyses.

Project description

Stars codecov pypi

LaminDB: Manage R&D data & analyses

Curate, store, track, query, integrate, and learn from biological data.

LaminDB is an open-source data lake for R&D in biology. It manages indexed object storage (local directories, S3, GCP) with a mapped SQL database (SQLite, Postgres, and soon, BigQuery).

One cool thing is that you can readily create distributed LaminDB instances at any scale. Get started on your laptop, deploy in the cloud, or work with a mesh of instances for different teams and purposes.

Public beta: Currently only recommended for collaborators as we still make breaking changes.

Installation

LaminDB is a python package available for Python versions 3.8+.

pip install lamindb

Import

In your python script, import LaminDB as:

import lamindb as ln

Quick setup

Quick setup on the command line:

  • Sign up via lamin signup <email>
  • Log in via lamin login <handle>
  • Set up an instance via lamin init --storage <storage> --schema <schema_modules>

:::{dropdown} Example code

lamin signup testuser1@lamin.ai
lamin login testuser1
lamin init --storage ./mydata --schema bionty,wetlab

:::

See {doc}/guide/setup for more.

Track & query data

Track data source & data

::::{tab-set} :::{tab-item} Within a notebook

ln.nb.header()  # data source is created and linked

df = pd.DataFrame({"a": [1, 2], "b": [3, 4]})

# create a data object with SQL metadata record
dobject = ln.DObject(df, name="My dataframe")

# upload the data file to the configured storage
# and commit a DObject record to the SQL database
ln.add(dobject)

::: :::{tab-item} Within a pipeline

# create a pipeline record
pipeline = lns.Pipeline(name="my pipeline", version="1")

# create a run from the above pipeline as the data source
run = lns.Run(pipeline=pipeline, name="my run")

df = pd.DataFrame({"a": [1, 2], "b": [3, 4]})

# create a data object with SQL metadata record
dobject = ln.DObject(df, name="My dataframe", source=run)

# upload the data file to the configured storage
# and commit a DObject record to the SQL database
ln.add(dobject)

::: ::::

Query & load data

dobject = ln.select(ln.DObject, name="My dataframe").one()
df = dobject.load()

See {doc}/guide/ingest for more.

Track biological features

import bionty as bt

# An sample single cell RNA-seq dataset
adata = ln.dev.datasets.anndata_mouse_sc_lymph_node()

# Start to track genes mapped to a Bionty Entity
# - ensembl id as the standardized id
# - mouse as the species
reference = bt.Gene(id=bt.gene_id.ensembl_gene_id, species=bt.Species().lookup.mouse)

# Create a data object with features
dobject = ln.DObject(adata, name="Mouse Lymph Node scRNA-seq", features_ref=reference)

# upload the data file to the configured storage
# and commit a DObject record to the sql database
ln.add(dobject)

See {doc}/guide/link-features for more.

- Each page in this guide is a Jupyter Notebook, which you can download [here](https://github.com/laminlabs/lamindb/tree/main/docs/guide).
- You can run these notebooks in hosted versions of JupyterLab, e.g., [Saturn Cloud](https://github.com/laminlabs/run-lamin-on-saturn), Google Vertex AI, and others.
- We recommend using [JupyterLab](https://jupyterlab.readthedocs.io/) for best notebook tracking experience.

📬 Reach out to report issues, learn about data modules that connect your assays, pipelines & workflows within our data platform enterprise plan.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lamindb-0.31.1.tar.gz (85.8 kB view details)

Uploaded Source

Built Distribution

lamindb-0.31.1-py2.py3-none-any.whl (47.2 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file lamindb-0.31.1.tar.gz.

File metadata

  • Download URL: lamindb-0.31.1.tar.gz
  • Upload date:
  • Size: 85.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.28.2

File hashes

Hashes for lamindb-0.31.1.tar.gz
Algorithm Hash digest
SHA256 0c5a6c13d7012f6333b285ba7c6455f844ea27a2edc8c681ce4ea6c5058e63ff
MD5 32f02069338c4b603ed4b64bb23db69c
BLAKE2b-256 fae02657d81e54af7398c25e19dd45c9ca21f8d97b58820cbcc82df1800c5d80

See more details on using hashes here.

Provenance

File details

Details for the file lamindb-0.31.1-py2.py3-none-any.whl.

File metadata

  • Download URL: lamindb-0.31.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 47.2 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.28.2

File hashes

Hashes for lamindb-0.31.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 f2bf002efdcfb588341d8ae039368cf2d022e61716b12899a14e664a4732faac
MD5 5b46fb7c2f8f8626793973824d7e2547
BLAKE2b-256 ca1b45c76be49f3111502b5387bc30fa271c9679fb31fc5f0216d3d9252c4af0

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page