Hash-based phonemic sequence identifiers
Project description
Konstel(lations)
Not yet stable, proceed with caution
An extensible command line tool and library for generating memorable and pronounceable hash-based identifier schemes for sequences, biological or otherwise. For further details and my SARS-CoV-2 naming proposal, please read my blog post. Requires Python 3.6+.
SARS-CoV-2 naming
Phonemic and truncated cbase32 identifiers provide 36 and 40 bits of entropy respectively, producing no collisions within publicly deposited SARS-CoV-2 spike protein sequences as of 2021-04-12.
Install
Ideally inside a new virtualenv or conda environment:
# Latest release
pip install konstel
# Development version
git clone https://github.com/bede/konstel
pip install --editable konstel
Usage
Command line
$ konstel gen sars-cov-2-s.genome konstel/tests/data/spike.genome.fa --output table
scheme sars-cov-2-s
hash S:0k8n9hjh5xh5kbef1k6ye7e2d4brhpry5r985avrtf69v6amrbc0
hash-8 S:0k8n9hjh
id S:huhiji-gakihi
$ echo "ACGT" | konstel gen generic.nucl - --output table
scheme generic
hash 3qzkx17yf1vy0ssvd6xxvkt02973jvhzk51xv28cj6va16pvkbr0
id bituzu-gupahu-zolodu-lumaki-suripi-rozitu-guhabi-figogo
Python
>>> from konstel import konstel
>>> konstel.generate('sars-cov-2-s.protein', 'konstel/tests/data/spike.prot.fa')
{'scheme': 'sars-cov-2-s', 'hash': 'S:0k8n9hjh5xh5kbef1k6ye7e2d4brhpry5r985avrtf69v6amrbc0', 'hash-8': 'S:0k8n9hjh', 'id': 'S:huhiji-gakihi'}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
konstel-0.8.0.tar.gz
(9.3 kB
view details)
File details
Details for the file konstel-0.8.0.tar.gz
.
File metadata
- Download URL: konstel-0.8.0.tar.gz
- Upload date:
- Size: 9.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.24.0 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3c889d51bce9e01262dbcf71e0792b5381884ede63483606f8bdd0171c66ae9c |
|
MD5 | 39b2a14624b68ae46440713bbdd904bb |
|
BLAKE2b-256 | f12dd0c9b14bc853edaae093b4b77aca379387a7c9decef6e067dfbf4a00f6a7 |