Skip to main content

Hash-based phonemic sequence identifiers

Project description

Konstel(lations)

Tests PyPI

Not yet stable, proceed with caution

An extensible command line tool and library for generating memorable and pronounceable hash-based identifier schemes for sequences, biological or otherwise. For further details and my SARS-CoV-2 naming proposal, please read my blog post. Requires Python 3.6+.

SARS-CoV-2 naming

Phonemic and truncated cbase32 identifiers provide 36 and 40 bits of entropy respectively, producing no collisions within publicly deposited SARS-CoV-2 spike protein sequences as of 2021-04-12.

Install

Ideally inside a new virtualenv or conda environment:

# Latest release
pip install konstel

# Development version
git clone https://github.com/bede/konstel
pip install --editable konstel

Usage

Command line

$ konstel gen sars-cov-2-s.genome konstel/tests/data/spike.genome.fa --output table
scheme               sars-cov-2-s   
hash                 S:0k8n9hjh5xh5kbef1k6ye7e2d4brhpry5r985avrtf69v6amrbc0
hash-8               S:0k8n9hjh     
id                   S:huhiji-gakihi  

$ echo "ACGT" | konstel gen generic.nucl - --output table
scheme               generic        
hash                 3qzkx17yf1vy0ssvd6xxvkt02973jvhzk51xv28cj6va16pvkbr0
id                   bituzu-gupahu-zolodu-lumaki-suripi-rozitu-guhabi-figogo

Python

>>> from konstel import konstel
>>> konstel.generate('sars-cov-2-s.protein', 'konstel/tests/data/spike.prot.fa')
{'scheme': 'sars-cov-2-s', 'hash': 'S:0k8n9hjh5xh5kbef1k6ye7e2d4brhpry5r985avrtf69v6amrbc0', 'hash-8': 'S:0k8n9hjh', 'id': 'S:huhiji-gakihi'}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

konstel-0.8.0.tar.gz (9.3 kB view details)

Uploaded Source

File details

Details for the file konstel-0.8.0.tar.gz.

File metadata

  • Download URL: konstel-0.8.0.tar.gz
  • Upload date:
  • Size: 9.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.24.0 CPython/3.7.3

File hashes

Hashes for konstel-0.8.0.tar.gz
Algorithm Hash digest
SHA256 3c889d51bce9e01262dbcf71e0792b5381884ede63483606f8bdd0171c66ae9c
MD5 39b2a14624b68ae46440713bbdd904bb
BLAKE2b-256 f12dd0c9b14bc853edaae093b4b77aca379387a7c9decef6e067dfbf4a00f6a7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page