Skip to main content

Hash-based phonemic sequence identifiers

Project description

Konstel(lations)

Tests PyPI

Not yet stable, proceed with caution

An extensible command line tool and library for generating memorable and pronounceable hash-based identifier schemes for sequences, biological or otherwise. For further details and my SARS-CoV-2 naming proposal, please read my blog post. Requires Python 3.6+.

SARS-CoV-2 naming

Phonemic and truncated cbase32 identifiers provide 36 and 40 bits of entropy respectively, producing no collisions within publicly deposited SARS-CoV-2 spike protein sequences as of 2021-04-12.

Install

Ideally inside a new virtualenv or conda environment:

# Latest release
pip install konstel

# Development version
git clone https://github.com/bede/konstel
pip install --editable konstel

Usage

Command line

$ konstel gen sars-cov-2-s.genome konstel/tests/data/spike.genome.fa --output table
scheme               sars-cov-2-s   
hash                 S:0k8n9hjh5xh5kbef1k6ye7e2d4brhpry5r985avrtf69v6amrbc0
hash-8               S:0k8n9hjh     
id                   S:huhiji-gakihi  

$ echo "ACGT" | konstel gen generic.nucl - --output table
scheme               generic        
hash                 3qzkx17yf1vy0ssvd6xxvkt02973jvhzk51xv28cj6va16pvkbr0
id                   bituzu-gupahu-zolodu-lumaki-suripi-rozitu-guhabi-figogo

Python

>>> from konstel import konstel
>>> konstel.generate('sars-cov-2-s.protein', 'konstel/tests/data/spike.prot.fa')
{'scheme': 'sars-cov-2-s', 'hash': 'S:0k8n9hjh5xh5kbef1k6ye7e2d4brhpry5r985avrtf69v6amrbc0', 'hash-8': 'S:0k8n9hjh', 'id': 'S:huhiji-gakihi'}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

konstel-0.8.1.tar.gz (9.3 kB view details)

Uploaded Source

File details

Details for the file konstel-0.8.1.tar.gz.

File metadata

  • Download URL: konstel-0.8.1.tar.gz
  • Upload date:
  • Size: 9.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.24.0 CPython/3.7.3

File hashes

Hashes for konstel-0.8.1.tar.gz
Algorithm Hash digest
SHA256 1ac679dbb48a5da0743754c16e31defdcfabf0f14e74b19f797ef4067e4cf27c
MD5 476176d7547c9913f04c62c5adb6690e
BLAKE2b-256 772448a7588c286b10362d530347cf684639e2b2285ad24efc7325fe138ba31e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page