Skip to main content

Python REST API for Entrez E-Utilities: stateless, easy to use, reliable.

Project description

easy-entrez

tests CodeQL Documentation Status

Python REST API for Entrez E-Utilities, aiming to be easy to use and reliable.

Easy-entrez:

  • makes common tasks easy thanks to simple Pythonic API,
  • is typed and integrates well with mypy,
  • is tested on Windows, Mac and Linux across Python 3.6, 3.7, 3.8 and 3.9,
  • is limited in scope, allowing to focus on the reliability of the core code,
  • does not use the stateful API as it is error-prone as seen on example of the alternative entrezpy.

Status: beta (pending tutorial write-up and documentation improvements before official release).

from easy_entrez import EntrezAPI

entrez_api = EntrezAPI(
    'your-tool-name',
    'e@mail.com',
    # optional
    return_type='json'
)

# find up to 10 000 results for cancer in human
result = entrez_api.search('cancer AND human[organism]', max_results=10_000)

# data will be populated with JSON or XML (depending on the `return_type` value)
result.data

See more in the Demo notebook and documentation.

For a real-world example (i.e. used for this publication) see notebooks in multi-omics-state-of-the-field repository.

Example: fetching genes for a variant from dbSNP

Fetch the SNP record for rs6311:

rs6311 = entrez_api.fetch(['rs6311'], max_results=1, database='snp').data[0]
rs6311

Display the result:

from xml.dom import minidom
from xml.etree import ElementTree


def xml_to_sting(element):
    return (
        minidom.parseString(ElementTree.tostring(element))
        .toprettyxml(indent=' ' * 4)
    )


print(xml_to_sting(rs6311))

Find the gene names for rs6311:

namespaces = {'ns0': 'https://www.ncbi.nlm.nih.gov/SNP/docsum'}
genes = [
    name.text
    for name in rs6311.findall('.//ns0:GENE_E/ns0:NAME', namespaces)
]
print(genes)

['HTR2A']

Fetch data for multiple variants at once:

result = entrez_api.fetch(['rs6311', 'rs662138'], max_results=10, database='snp')
gene_names = {
    'rs' + document_summary.get('uid'): [
        element.text
        for element in document_summary.findall('.//ns0:GENE_E/ns0:NAME', namespaces)
    ]
    for document_summary in result.data
}
print(gene_names)

{'rs6311': ['HTR2A'], 'rs662138': ['SLC22A1']}

Example: obtaining the SNP rs ID number from chromosomal position

You can use the query string directly:

results = entrez_api.search(
    '13[CHROMOSOME] AND human[ORGANISM] AND 31873085[POSITION]',
    database='snp',
    max_results=10
)
print(results.data['esearchresult']['idlist'])

['59296319', '17076752', '7336701', '4']

Or pass a dictionary (no validation of arguments is performed, AND conjunction is used):

results = entrez_api.search(
    dict(chromosome=13, organism='human', position=31873085),
    database='snp',
    max_results=10
)
print(results.data['esearchresult']['idlist'])

['59296319', '17076752', '7336701', '4']

The base position should use the latest genome assembly (GRCh38 at the time of writing); you can use the position in previous assembly coordinates by replacing POSITION with POSITION_GRCH37. For more information of the arguments accepted by the SNP database see the entrez help page on NCBI website.

Installation

Requires Python 3.6+. Install with:

pip install easy-entrez

If you wish to enable (optional, tqdm-based) progress bars use:

pip install easy-entrez[with_progress_bars]

Alternatives:

You might want to try:

  • biopython.Entrez - biopython is a heavy dependency, but probably good choice if you already use it
  • pubmedpy - provides interesting utilities for parsing the responses
  • entrez - appears to have a comparable scope but quite different API

I have tried and do not recommend:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easy_entrez-0.3.0.tar.gz (17.5 kB view details)

Uploaded Source

Built Distribution

easy_entrez-0.3.0-py3-none-any.whl (17.2 kB view details)

Uploaded Python 3

File details

Details for the file easy_entrez-0.3.0.tar.gz.

File metadata

  • Download URL: easy_entrez-0.3.0.tar.gz
  • Upload date:
  • Size: 17.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.6

File hashes

Hashes for easy_entrez-0.3.0.tar.gz
Algorithm Hash digest
SHA256 bb351b3ca64a3300391099ecef053729b6801881e98fa9618a7c62e8e8f2589d
MD5 2fcaeedc9d2a9f00cae4f44d657d6e96
BLAKE2b-256 a89b73def27f3f8815a19f8452e599922d28d9408b2600a635b2aaddc1a939a1

See more details on using hashes here.

File details

Details for the file easy_entrez-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: easy_entrez-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 17.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.6

File hashes

Hashes for easy_entrez-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 aed6d709807b3747ede979166c3b4e3da564f6b2924d46b561a15abe22bcf140
MD5 60217e5c364aae63611c9ae3100579ff
BLAKE2b-256 59c1745a41ada25b56d75b02dcd5be76eef42847ab0e1658783d4d5f0ecb895d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page