hgvs · PyPI

HGVS Parser, Formatter, Mapper, Validator

These details have been verified by PyPI

Maintainers

These details have not been verified by PyPI

Project links

Project description

Important: biocommons packages require Python 3.6+. More

The hgvs package provides a Python library to parse, format, validate, normalize, and map sequence variants according to Variation Nomenclature (aka Human Genome Variation Society) recommendations.

Specifically, the hgvs package focuses on the subset of the HGVS recommendations that precisely describe sequence-level variation relevant to the application of high-throughput sequencing to clinical diagnostics. The package does not attempt to cover the full scope of HGVS recommendations. Please refer to issues for limitations.

Information
Latest Release
Development (main branch) \|

Features

Parsing is based on formal grammar.
An easy-to-use object model that represents most variant types (SNVs, indels, dups, inverstions, etc) and concepts (intronic offsets, uncertain positions, intervals)
A variant normalizer that rewrites variants in canoncial forms and substitutes reference sequences (if reference and transcript sequences differ)
Formatters that generate HGVS strings from internal representations
Tools to map variants between genome, transcript, and protein sequences
Reliable handling of regions genome-transcript discrepancies
Pluggable data providers support alternative sources of transcript mapping data
Extensive automated tests, including those for all variant types and “problematic” transcripts
Easily installed using remote data sources. Installation with local data sources is straightforward and completely obviates network access

Important Notes

You are encouraged to browse issues. All known issues are listed there. Please report any issues you find.
Use a pip package specification to stay within minor releases. For example, hgvs>=1.5,<1.6. hgvs uses Semantic Versioning.

Examples

Installation

By default, hgvs uses remote data sources, which makes installation easy.

$ mkvirtualenv hgvs-test
(hgvs-test)$ pip install --upgrade setuptools
(hgvs-test)$ pip install hgvs
(hgvs-test)$ python

See Installation instructions for details, including instructions for installing Universal Transcript Archive (UTA) and SeqRepo locally.

Configuration

hgvs will use publicly available data sources unless directed otherwise through environment variables, like so:

# N.B. These are examples. The correct values will depend on your installation
$ export UTA_DB_URL=postgresql://anonymous:anonymous@localhost:5432/uta/uta_20210129
$ export HGVS_SEQREPO_DIR=/usr/local/share/seqrepo/latest

Alternatively, if you are unable to pass the postgresql password in the UTA_DB_URL environment variable (i.e., generating an auth token), you can set UTA_DB_URL to postgresql://<user>@<host>/<db>/<schema> and set PGPASSWORD. For example:

$ export UTA_DB_URL=postgresql://anonymous@localhost:5432/uta/uta_20210129 PGPASSWORD=anonymous

See the installation instructions for details.

Parsing and Formating

hgvs parses HGVS variants (as strings) into an object model, and can format object models back into HGVS strings.

>>> import hgvs.parser

# start with these variants as strings
>>> hgvs_g = 'NC_000007.13:g.36561662C>T'
>>> hgvs_c = 'NM_001637.3:c.1582G>A'

# parse the genomic variant into a Python structure
>>> hp = hgvs.parser.Parser()
>>> var_g = hp.parse_hgvs_variant(hgvs_g)
>>> var_g
SequenceVariant(ac=NC_000007.13, type=g, posedit=36561662C>T, gene=None)

# SequenceVariants are composed of structured objects, e.g.,
>>> var_g.posedit.pos.start
SimplePosition(base=36561662, uncertain=False)

# format by stringification
>>> str(var_g)
'NC_000007.13:g.36561662C>T'

Projecting (“Mapping”) variants between aligned genome and transcript sequences

hgvs provides tools to project variants between genome, transcript, and protein sequences. Non-coding and intronic variants are supported. Alignment data come from the Universal Transcript Archive (UTA).

>>> import hgvs.dataproviders.uta
>>> import hgvs.assemblymapper

# initialize the mapper for GRCh37 with splign-based alignments
>>> hdp = hgvs.dataproviders.uta.connect()
>>> am = hgvs.assemblymapper.AssemblyMapper(hdp,
...          assembly_name='GRCh37', alt_aln_method='splign',
...          replace_reference=True)

# identify transcripts that overlap this genomic variant
>>> transcripts = am.relevant_transcripts(var_g)
>>> sorted(transcripts)
['NM_001177506.1', 'NM_001177507.1', 'NM_001637.3']

# map genomic variant to one of these transcripts
>>> var_c = am.g_to_c(var_g, 'NM_001637.3')
>>> var_c
SequenceVariant(ac=NM_001637.3, type=c, posedit=1582G>A, gene=None)
>>> str(var_c)
'NM_001637.3:c.1582G>A'

# CDS coordinates use BaseOffsetPosition to support intronic offsets
>>> var_c.posedit.pos.start
BaseOffsetPosition(base=1582, offset=0, datum=Datum.CDS_START, uncertain=False)

Translating coding variants to protein sequences

Coding variants may be translated to their protein consequences. HGVS uses the same pairing of transcript and protein accessions as seen in NCBI and Ensembl.

# translate var_c to its protein consequence
# The object structure of protein variants is nearly identical to
# that of nucleic acid variants and is converted to a string form
# by stringification. Per HGVS recommendations, inferred consequences
# must have parentheses to indicate uncertainty.
>>> var_p = am.c_to_p(var_c)
>>> var_p
SequenceVariant(ac=NP_001628.1, type=p, posedit=(Gly528Arg), gene=None)
>>> str(var_p)
'NP_001628.1:p.(Gly528Arg)'

# setting uncertain to False removes the parentheses on the
# stringified form
>>> var_p.posedit.uncertain = False
>>> str(var_p)
'NP_001628.1:p.Gly528Arg'

# formatting can be customized, e.g., use 1 letter amino acids to
# format a specific variant
# (configuration may also be set globally)
>>> var_p.format(conf={"p_3_letter": False})
'NP_001628.1:p.G528R'

Normalizing variants

Some variants have multiple representations due to instrinsic biological ambiguity (e.g., inserting a G in a poly-G run) or due to misunderstanding HGVS recommendations. Normalization rewrites certain veriants into a single representation.

# rewrite ins as dup (depends on sequence context)
>>> import hgvs.normalizer
>>> hn = hgvs.normalizer.Normalizer(hdp)
>>> hn.normalize(hp.parse_hgvs_variant('NM_001166478.1:c.35_36insT'))
SequenceVariant(ac=NM_001166478.1, type=c, posedit=35dup, gene=None)

# during mapping, variants are normalized (by default)
>>> c1 = hp.parse_hgvs_variant('NM_001166478.1:c.31del')
>>> c1
SequenceVariant(ac=NM_001166478.1, type=c, posedit=31del, gene=None)
>>> c1n = hn.normalize(c1)
>>> c1n
SequenceVariant(ac=NM_001166478.1, type=c, posedit=35del, gene=None)
>>> g = am.c_to_g(c1)
>>> g
SequenceVariant(ac=NC_000006.11, type=g, posedit=49917127del, gene=None)
>>> c2 = am.g_to_c(g, c1.ac)
>>> c2
SequenceVariant(ac=NM_001166478.1, type=c, posedit=35del, gene=None)

There are more examples in the documentation.

Citing hgvs (the package)

hgvs: A Python package for manipulating sequence variants using HGVS nomenclature: 2018 Update.
Wang M, Callenberg KM, Dalgleish R, Fedtsov A, Fox N, Freeman PJ, Jacobs KB, Kaleta P, McMurry AJ, Prlić A, Rajaraman V, Hart RK
Human Mutation. 2018 Pubmed | Open Access PDF

A Python Package for Parsing, Validating, Mapping, and Formatting Sequence Variants Using HGVS Nomenclature.
Hart RK, Rico R, Hare E, Garcia J, Westbrook J, Fusaro VA.
Bioinformatics. 2014 Sep 30. PubMed | Open Access PDF

Contributing

The hgvs package is intended to be a community project. Please see Contributing to get started in submitting source code, tests, or documentation. Thanks for getting involved!

Project details

These details have been verified by PyPI

Maintainers

biocommons jsstevenson korikuzma reece

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.5.4

Nov 21, 2022

1.5.3

Nov 12, 2022

This version

1.5.3rc2 pre-release

Nov 12, 2022

1.5.2

Dec 18, 2021

1.5.2rc0 pre-release

Dec 18, 2021

1.5.1

Mar 29, 2020

1.5.0.post1

Mar 17, 2020

1.5.0

Mar 17, 2020

1.5.0rc0 pre-release

Mar 17, 2020

1.4.0

Jan 27, 2020

1.3.0.post0

May 15, 2019

1.3.0

May 15, 2019

1.3.0rc0 pre-release

May 13, 2019

1.2.5.post1

Feb 1, 2019

1.2.5.post0

Feb 1, 2019

1.2.5

Feb 1, 2019

1.2.4

Sep 28, 2018

1.2.3

Sep 4, 2018

1.2.2

Aug 9, 2018

1.2.1

Jul 22, 2018

1.2.0

Jul 15, 2018

1.2.0rc1 pre-release

Jul 15, 2018

1.1.3

Jul 2, 2018

1.1.2

Apr 1, 2018

1.1.1

Nov 25, 2017

1.1.0.post1

Jul 12, 2017

1.1.0

Jul 12, 2017

1.0.0.post3

Apr 11, 2017

1.0.0.post2

Apr 11, 2017

1.0.0.post1

Apr 9, 2017

1.0.0.post0

Apr 9, 2017

1.0.0

Apr 9, 2017

1.0.0rc1.post2 pre-release

Apr 4, 2017

1.0.0rc1 pre-release

Apr 3, 2017

1.0.0a4 pre-release

Apr 3, 2017

1.0.0a3 pre-release

Mar 31, 2017

1.0.0a2 pre-release

Mar 30, 2017

1.0.0a1 pre-release

Mar 11, 2017

0.4.14

May 19, 2017

0.4.13

Dec 12, 2016

0.4.12

Dec 7, 2016

0.4.11

Sep 15, 2016

0.4.10

Sep 13, 2016

0.4.9

Aug 1, 2016

0.4.8

Jul 20, 2016

0.4.7

Jun 27, 2016

0.4.6

Jun 27, 2016

0.4.5

Apr 1, 2016

0.4.4

Dec 15, 2015

0.4.3

Dec 6, 2015

0.4.2

Sep 30, 2015

0.4.1

Sep 14, 2015

0.4.0.post1

Sep 10, 2015

0.4.0

Sep 10, 2015

0.3.7

Jul 2, 2015

0.3.6

Jun 3, 2015

0.3.5

May 19, 2015

0.3.3

Aug 28, 2014

0.3.2

Jul 13, 2014

0.3.1

Jul 13, 2014

0.3

Jul 8, 2014

0.2.2

Jun 12, 2014

0.2.1

Jun 11, 2014

0.2

Mar 10, 2014

0.1.11

Mar 5, 2014

0.1.10

Jan 23, 2014

0.1.9

Mar 5, 2014

0.1.7

Jan 22, 2014

0.1.6

Jan 12, 2014

0.1.5

Jan 11, 2014

0.1.2

Jan 6, 2014

0.1.1

Jan 4, 2014

0.0.0

Nov 28, 2013

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hgvs-1.5.3rc2.tar.gz (2.2 MB view details)

Uploaded Nov 12, 2022 Source

Built Distribution

hgvs-1.5.3rc2-py2.py3-none-any.whl (107.5 kB view details)

Uploaded Nov 12, 2022 Python 2 Python 3

File details

Details for the file hgvs-1.5.3rc2.tar.gz.

File metadata

Download URL: hgvs-1.5.3rc2.tar.gz
Upload date: Nov 12, 2022
Size: 2.2 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.9.15

File hashes

Hashes for hgvs-1.5.3rc2.tar.gz
Algorithm	Hash digest
SHA256	`36c08f2dc0a3cbd1f60add1526894a5587eb96da91d019d6cce1113f2690d373`
MD5	`fd2ea0e5d1802021a5d2dc76cde9a3eb`
BLAKE2b-256	`cfca24fc2a719480aaf608ffcf7b42afdd26c0ba6aec8ae533ad4dd9d9f81313`

See more details on using hashes here.

File details

Details for the file hgvs-1.5.3rc2-py2.py3-none-any.whl.

File metadata

Download URL: hgvs-1.5.3rc2-py2.py3-none-any.whl
Upload date: Nov 12, 2022
Size: 107.5 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.9.15

File hashes

Hashes for hgvs-1.5.3rc2-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`52dc0fc9c7294409e69ee5009a1394c315cd4fe96a7523d05859850952d40522`
MD5	`e962504b07893efbdb7b57efc799c401`
BLAKE2b-256	`8be103342926cdfc28866ce252c54400050980ad91502b3536c10d00ed0c7f0b`