Skip to main content

Hash-based phonemic sequence identifiers

Project description

Konstel(lations)

Tests PyPI

An extensible command line tool and library for generating memorable and pronounceable hash-based identifier schemes for sequences, biological or otherwise. Requires Python 3.6+.

SARS-CoV-2 spike protein naming

Phonemic and truncated cbase32 identifiers provide 34 and 35 bits of entropy respectively, collision-free for publicly deposited SARS-CoV-2 spike protein sequences as of 2021-05-05. Phonemic identifiers include consonant-vowel pairs with a separator after the fourth consonant (e.g. dazator-isak). The first segment provides a useful compromise of identifier length and low collision rate, while inclusion of the second segment achieves collision resistance. Longer identifiers still may be minted by overriding the scheme's default length profile. For my original SARS-CoV-2 naming proposal, please refer to my blog post.

Install

Ideally inside a new virtualenv or conda environment:

# Latest release
pip install konstel

# Development version
git clone https://github.com/bede/konstel
pip install --editable konstel

Usage

Command line

$ konstel gen sars-cov-2-s.genome konstel/tests/data/spike.genome.fa --output table
scheme               sars-cov-2-s   
hash                 S:0k8n9hjh5xh5kbef1k6ye7e2d4brhpry5r985avrtf69v6amrbc0
hash-8               S:0k8n9hjh     
id                   S:huhijig-akihi  

$ echo "ACGT" | konstel gen generic.nucl - --output table
scheme               generic        
hash                 3qzkx17yf1vy0ssvd6xxvkt02973jvhzk51xv28cj6va16pvkbr0
id                   bituzu-gupahu-zolodu-lumaki-suripi-rozitu-guhabi-figogo

Python

>>> from konstel import konstel
>>> konstel.generate('sars-cov-2-s.protein', 'konstel/tests/data/spike.prot.fa')
{'scheme': 'sars-cov-2-s', 'hash': 'S:0k8n9hjh5xh5kbef1k6ye7e2d4brhpry5r985avrtf69v6amrbc0', 'hash-8': 'S:0k8n9hjh', 'id': 'S:huhijig-akihi'}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

konstel-0.9.0.tar.gz (12.0 kB view details)

Uploaded Source

File details

Details for the file konstel-0.9.0.tar.gz.

File metadata

  • Download URL: konstel-0.9.0.tar.gz
  • Upload date:
  • Size: 12.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.24.0 CPython/3.7.3

File hashes

Hashes for konstel-0.9.0.tar.gz
Algorithm Hash digest
SHA256 7a9d2281b71b29625e61796d822cbb22f5380b0cfa05f9559d62ed62d29bae21
MD5 731460d3f77be2ba18029cbc9429bdbc
BLAKE2b-256 a2317ca7389516a7a06ce56b0212b68bb479c703ebd4e3899b8572e4ccb2e6a1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page