Skip to main content

A pipeline for protein embedding generation and visualization

Project description

Bio Embeddings

The project includes:

  • A pipeline that allows to embed a FASTA file choosing from various embedders (see below), and then project and visualize the embeddings on 3D plots.
  • A web server that takes in sequences, embeds them and returns the embeddings OR visualizes the embedding spaces on interactive plots online.
  • General purpose library to embed protein sequences in any python app.

Important information

  • The albert model weights are not publicly available yet. You can request early access by opening an issue.
  • Please help us out by opening issues and submitting PRs as you see fit, this repository is actively being developed.

Install guides

You can install the package via PIP like so:

pip install bio-embeddings

Or directly from the source (e.g. to have the latest features):

pip install -U git+https://github.com/sacdallago/bio_embeddings.git

Additional dependencies and steps to run the webserver

If you want to run the webserver locally, you need to have some python backend deployment experience. You'll need a couple of dependencies if you want to run the webserver locally: pip install dash celery pymongo flask-restx pyyaml.

Additionally, you will need to have two instances of the app run (the backend and at least one celery worker), and both instances must be granted access to a MongoDB and a RabbitMQ or Redis store for celery.

Examples

We highly recommend you to check out the examples folder for pipeline examples, and the notebooks folder for post-processing pipeline runs and general purpose use of the embedders.

After having installed the package, you can:

  1. Use the pipeline like:

    bio_embeddings config.yml
    

    A blueprint of the configuration file, and an example setup can be found in the examples directory of this repository.

  2. Use the general purpose embedder objects via python, e.g.:

    from bio_embeddings import SeqVecEmbedder
    
    embedder = SeqVecEmbedder()
    
    embedding = embedder.embed("SEQVENCE")
    

    More examples can be found in the notebooks folder of this repository.

Development status

  1. Pipeline stages

  2. Web server:

    • SeqVec
    • Albert (unpublished)
  3. General purpose objects:

    • SeqVec
    • Fastext
    • Glove
    • Word2Vec
    • UniRep
    • Albert (unpublished)

Building a Distribution

Building the packages best happens using invoke. If you manganage your dependecies with poetry this should be already installed. Simply use poetry run invoke clean build to update your requirements according to your current status and to generate the dist files

Contributors

  • Christian Dallago (lead)
  • Tobias Olenyi
  • Michael Heinzinger

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bio_embeddings-0.1.2.tar.gz (79.9 kB view details)

Uploaded Source

Built Distribution

bio_embeddings-0.1.2-py3-none-any.whl (102.2 kB view details)

Uploaded Python 3

File details

Details for the file bio_embeddings-0.1.2.tar.gz.

File metadata

  • Download URL: bio_embeddings-0.1.2.tar.gz
  • Upload date:
  • Size: 79.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.5 CPython/3.8.2 Linux/5.5.9-arch1-2

File hashes

Hashes for bio_embeddings-0.1.2.tar.gz
Algorithm Hash digest
SHA256 d7fa6c78c40c7f7778b7b16141184ec7b0a4a2bb4e7176595e3bf3d6ef53a14e
MD5 0ce194522b65109cff6e2c2c64597383
BLAKE2b-256 45f8724c7f2e77df3dc5ce73978b8185435443459baae814b0157d5f13c97117

See more details on using hashes here.

File details

Details for the file bio_embeddings-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: bio_embeddings-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 102.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.5 CPython/3.8.2 Linux/5.5.9-arch1-2

File hashes

Hashes for bio_embeddings-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 efa7cf589d7c6b51ae677be232c4251ec9ad0d12fc18cdd9fb62381b2dce0de2
MD5 9015610602497893c1a4d6593c069324
BLAKE2b-256 cb69c7a42bff2578a977a0b210d8ea11fef172cfac8d2b218299444cd8bbeefd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page