Python REST API for Entrez E-Utilities: stateless, easy to use, reliable.
Project description
easy-entrez
Python REST API for Entrez E-Utilities, aiming to be easy to use and reliable.
Easy-entrez:
- makes common tasks easy thanks to simple Pythonic API,
- is typed and integrates well with mypy,
- is tested on Windows, Mac and Linux across Python 3.7, 3.8, 3.9 and 3.10,
- is limited in scope, allowing to focus on the reliability of the core code,
- does not use the stateful API as it is error-prone as seen on example of the alternative entrezpy.
Status: beta (pending tutorial write-up and documentation improvements before official release).
from easy_entrez import EntrezAPI
entrez_api = EntrezAPI(
'your-tool-name',
'e@mail.com',
# optional
return_type='json'
)
# find up to 10 000 results for cancer in human
result = entrez_api.search('cancer AND human[organism]', max_results=10_000)
# data will be populated with JSON or XML (depending on the `return_type` value)
result.data
See more in the Demo notebook and documentation.
For a real-world example (i.e. used for this publication) see notebooks in multi-omics-state-of-the-field repository.
Example: fetching genes for a variant from dbSNP
Fetch the SNP record for rs6311
:
rs6311 = entrez_api.fetch(['rs6311'], max_results=1, database='snp').data[0]
rs6311
Display the result:
from easy_entrez.parsing import xml_to_string
print(xml_to_string(rs6311))
Find the gene names for rs6311
:
namespaces = {'ns0': 'https://www.ncbi.nlm.nih.gov/SNP/docsum'}
genes = [
name.text
for name in rs6311.findall('.//ns0:GENE_E/ns0:NAME', namespaces)
]
print(genes)
['HTR2A']
Fetch data for multiple variants at once:
result = entrez_api.fetch(['rs6311', 'rs662138'], max_results=10, database='snp')
gene_names = {
'rs' + document_summary.get('uid'): [
element.text
for element in document_summary.findall('.//ns0:GENE_E/ns0:NAME', namespaces)
]
for document_summary in result.data
}
print(gene_names)
{'rs6311': ['HTR2A'], 'rs662138': ['SLC22A1']}
Example: obtaining the chromosomal position from SNP rsID number
from pandas import DataFrame
result = entrez_api.fetch(['rs6311', 'rs662138'], max_results=10, database='snp')
variant_positions = DataFrame([
{
'id': 'rs' + document_summary.get('uid'),
'chromosome': chromosome,
'position': position
}
for document_summary in result.data
for chrom_and_position in document_summary.findall('.//ns0:CHRPOS', namespaces)
for chromosome, position in [chrom_and_position.text.split(':')]
])
variant_positions
id chromosome position 0 rs6311 13 46897343 1 rs662138 6 160143444
Example: obtaining the SNP rs ID number from chromosomal position
You can use the query string directly:
results = entrez_api.search(
'13[CHROMOSOME] AND human[ORGANISM] AND 31873085[POSITION]',
database='snp',
max_results=10
)
print(results.data['esearchresult']['idlist'])
['59296319', '17076752', '7336701', '4']
Or pass a dictionary (no validation of arguments is performed, AND
conjunction is used):
results = entrez_api.search(
dict(chromosome=13, organism='human', position=31873085),
database='snp',
max_results=10
)
print(results.data['esearchresult']['idlist'])
['59296319', '17076752', '7336701', '4']
The base position should use the latest genome assembly (GRCh38 at the time of writing);
you can use the position in previous assembly coordinates by replacing POSITION
with POSITION_GRCH37
.
For more information of the arguments accepted by the SNP database see the entrez help page on NCBI website.
Installation
Requires Python 3.6+. Install with:
pip install easy-entrez
If you wish to enable (optional, tqdm-based) progress bars use:
pip install easy-entrez[with_progress_bars]
Alternatives:
You might want to try:
- biopython.Entrez - biopython is a heavy dependency, but probably good choice if you already use it
- pubmedpy - provides interesting utilities for parsing the responses
- entrez - appears to have a comparable scope but quite different API
I have tried and do not recommend:
- entrezpy - in addition to the history problems, watch out for documentation issues and basically no reaction to pull requests.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file easy_entrez-0.3.2.tar.gz
.
File metadata
- Download URL: easy_entrez-0.3.2.tar.gz
- Upload date:
- Size: 18.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/5.1.0 pkginfo/1.7.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e6a2d53860da21f70c2b5aebc42d5f56fb3e0795dc87c8fcecb06e8e7f75e784 |
|
MD5 | 89538e0af43051bb8e559e088606b348 |
|
BLAKE2b-256 | 33c66fe0b0a7faf9b8ff77e53e7a7e08eb5a7066c6112eef6ca98a2ea8ac55b9 |
File details
Details for the file easy_entrez-0.3.2-py3-none-any.whl
.
File metadata
- Download URL: easy_entrez-0.3.2-py3-none-any.whl
- Upload date:
- Size: 18.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/5.1.0 pkginfo/1.7.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ffb3b2a7874f6d171efb6398343465d224c6b7d353df490ab3d541d906c66598 |
|
MD5 | cf55c4758bfee4db08b50d6b18a13648 |
|
BLAKE2b-256 | 62d55ce3e89dfcef96bcde648987da90c665f1fd59683fd9797cc0b5687f205b |