LaminDB: Manage R&D data & analyses.
Project description
LaminDB: Manage R&D data & analyses
Curate, store, track, query, integrate, and learn from biological data.
LaminDB provides distributed data management in which users collaborate on LaminDB instances.
Each LaminDB instance is a data lake that manages indexed object storage (local directories, S3, GCP) with a mapped SQL database (SQLite, Postgres, and soon, BigQuery).
Public beta: Currently only recommended for collaborators as we still make breaking changes.
Installation
LaminDB is a python package available for Python versions 3.8+.
pip install lamindb
Import
In your python script, import LaminDB as:
import lamindb as ln
Quick setup
Quick setup on the command line:
- Sign up via
lamin signup <email>
- Log in via
lamin login <handle>
- Set up an instance via
lamin init --storage <storage> --schema <schema_modules>
:::{dropdown} Example code
lamin signup testuser1@lamin.ai
lamin login testuser1
lamin init --storage ./mydata --schema bionty,wetlab
:::
See {doc}/guide/setup
for more.
Track & query data
Track data source & data
::::{tab-set} :::{tab-item} Within a notebook
ln.nb.header() # data source is created and linked
df = pd.DataFrame({"a": [1, 2], "b": [3, 4]})
# create a data object with SQL metadata record
dobject = ln.DObject(df, name="My dataframe")
# upload the data file to the configured storage
# and commit a DObject record to the SQL database
ln.add(dobject)
::: :::{tab-item} Within a pipeline
# create a pipeline record
pipeline = lns.Pipeline(name="my pipeline", version="1")
# create a run from the above pipeline as the data source
run = lns.Run(pipeline=pipeline, name="my run")
df = pd.DataFrame({"a": [1, 2], "b": [3, 4]})
# create a data object with SQL metadata record
dobject = ln.DObject(df, name="My dataframe", source=run)
# upload the data file to the configured storage
# and commit a DObject record to the SQL database
ln.add(dobject)
::: ::::
Query & load data
dobject = ln.select(ln.DObject, name="My dataframe").one()
df = dobject.load()
See {doc}/guide/ingest
for more.
Track features
# Bionty extends lamindb to track biological entities
import bionty as bt
# An example single cell RNA-seq dataset
adata = ln.dev.datasets.anndata_mouse_sc_lymph_node()
# Instantiate a gene table
# with ensembl id as the standardized id
# with mouse as the species
reference = bt.Gene(id=bt.gene_id.ensembl_gene_id, species=bt.Species().lookup.mouse)
# Create a data object with features
dobject = ln.DObject(adata, name="Mouse Lymph Node scRNA-seq", features_ref=reference)
# upload the data file to the configured storage
# and commit a DObject record to the sql database
ln.add(dobject)
See {doc}/guide/link-features
for more.
- Each page in this guide is a Jupyter Notebook, which you can download [here](https://github.com/laminlabs/lamindb/tree/main/docs/guide).
- You can run these notebooks in hosted versions of JupyterLab, e.g., [Saturn Cloud](https://github.com/laminlabs/run-lamin-on-saturn), Google Vertex AI, and others.
- We recommend using [JupyterLab](https://jupyterlab.readthedocs.io/) for best notebook tracking experience.
📬 Reach out to report issues, learn about data modules that connect your assays, pipelines & workflows within our data platform enterprise plan.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for lamindb-0.31rc1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d13d509766a01d2a08775079a062a408dabd192f6c989b43cc8fdd1d656f6b88 |
|
MD5 | a898020d47f754347866d2737118b512 |
|
BLAKE2b-256 | c7b44afd6d3aed924f42d448ee27207fcc034d16ecaaeb4b93feea08b770bc16 |