Skip to main content

Generating dense embeddings for proteins using kernel PCA

Project description

This tool generates low-dimensional, continuous, distributed vector representations for non-numeric entities such as text or biological sequences (e.g. DNA or proteins) via kernel PCA with rational kernels.

The current implementation accepts any input dataset that can be read as a list of strings.

Installation Current version of RatVec on PyPI Python versions supported by RatVec RatVec is distributed under the Apache 2.0 License

RatVec can be installed on Python 3.6+ from PyPI with the following code in your favorite terminal:

$ pip install ratvec

or from the latest code on GitHub with:

$ pip install git+https://github.com/ratvec/ratvec.git

It can be installed in development mode with:

$ git clone https://github.com/ratvec/ratvec.git
$ cd ratvec
$ pip install -e .

The -e dynamically links the code in the git repository to the Python site-packages so your changes get reflected immediately.

How to Use

ratvec automatically installs a command line interface. Check it out with:

$ ratvec --help

RatVec has three main commands: generate, train, and evaluate:

  1. Generate. Downloads and prepare the SwissProt data set that is showcased in the RatVec paper.

$ ratvec generate
  1. Train. Compute KPCA embeddings on a given data set. Please run the following command to see the arguments:

$ ratvec train --help
  1. Evaluate. Evaluate and optimize KPCA embeddings. Please run the following command to see the arguments:

$ ratvec evaluate --help

Showcase Dataset

The application presented in the paper (SwissProt dataset [1] used by Boutet et al. [2]) can be downloaded directly from here or running the following command:

$ ratvec generate

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ratvec-0.1.2.tar.gz (22.6 kB view details)

Uploaded Source

Built Distribution

ratvec-0.1.2-py3-none-any.whl (24.0 kB view details)

Uploaded Python 3

File details

Details for the file ratvec-0.1.2.tar.gz.

File metadata

  • Download URL: ratvec-0.1.2.tar.gz
  • Upload date:
  • Size: 22.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2

File hashes

Hashes for ratvec-0.1.2.tar.gz
Algorithm Hash digest
SHA256 b6142ad1cccb65540be3e2e1cdfa163ee23bd064d1ccedaaf043312123f65753
MD5 a8d757ba08afb5c0c971f34bcbd3e2b1
BLAKE2b-256 e45f159e89b8c425adc6c338a058dedfd77fca1a979d8e16ef13436abf7e2145

See more details on using hashes here.

Provenance

File details

Details for the file ratvec-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: ratvec-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 24.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2

File hashes

Hashes for ratvec-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7bcba08243c0eb741158a447239ca202a5fd7cf6f5966eb2a149aefa17e31914
MD5 98e2e9656e68167cb07c1a7984669d3e
BLAKE2b-256 edb219709fed9c8bb3a782bcf7d18dced012f7e9d64fb096b68cecb3365aad85

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page