Skip to main content

Hash-based phonemic sequence identifiers

Project description

Konstel(lations)

Tests PyPI

An extensible command line tool and library for generating memorable and pronounceable hash-based identifier schemes for sequences, biological or otherwise. Requires Python 3.6+.

SARS-CoV-2 spike protein naming

Phonemic and truncated cbase32 identifiers provide 34 and 35 bits of entropy respectively, collision-free for publicly deposited SARS-CoV-2 spike protein sequences as of 2021-05-05. Phonemic identifiers include consonant-vowel pairs with a separator after the fourth consonant (e.g. dazator-isak). The first segment provides a useful compromise of identifier length and low collision rate, while inclusion of the second segment achieves collision resistance. Longer identifiers still may be minted by overriding the scheme's default length profile. For my original SARS-CoV-2 naming proposal, please refer to my blog post.

Install

Ideally inside a new virtualenv or conda environment:

# Latest release
pip install konstel

# Development version
git clone https://github.com/bede/konstel
pip install --editable konstel

Usage

Command line

$ konstel gen sars-cov-2-s.genome konstel/tests/data/spike.genome.fa --output table
scheme               sars-cov-2-s   
hash                 S:0k8n9hjh5xh5kbef1k6ye7e2d4brhpry5r985avrtf69v6amrbc0
hash-8               S:0k8n9hjh     
id                   S:huhijig-akihi  

$ echo "ACGT" | konstel gen generic.nucl - --output table
scheme               generic        
hash                 3qzkx17yf1vy0ssvd6xxvkt02973jvhzk51xv28cj6va16pvkbr0
id                   bituzu-gupahu-zolodu-lumaki-suripi-rozitu-guhabi-figogo

Python

>>> from konstel import konstel
>>> konstel.generate('sars-cov-2-s.protein', 'konstel/tests/data/spike.prot.fa')
{'scheme': 'sars-cov-2-s', 'hash': 'S:0k8n9hjh5xh5kbef1k6ye7e2d4brhpry5r985avrtf69v6amrbc0', 'hash-8': 'S:0k8n9hjh', 'id': 'S:huhijig-akihi'}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

konstel-0.8.3.tar.gz (9.6 kB view details)

Uploaded Source

File details

Details for the file konstel-0.8.3.tar.gz.

File metadata

  • Download URL: konstel-0.8.3.tar.gz
  • Upload date:
  • Size: 9.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.24.0 CPython/3.7.3

File hashes

Hashes for konstel-0.8.3.tar.gz
Algorithm Hash digest
SHA256 359f8cd2ea60dd8a19e8752dc269ef6398851966f9c0f255c55adeab1f421480
MD5 20f46a23f76b77300638e1fbd6f382cb
BLAKE2b-256 e5e0a36507fc0d2f3cd996f701d879b8e4e512147569f5421dd8c270e38fcdc4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page