Pandas API for Gene Set Enrichment Analysis in Python (GSEApy, cudaGSEA, GSEA)
Project description
Pandas API for Gene Set Enrichment Analysis in Python (GSEApy, cudaGSEA, GSEA)
aims to provide a unified API for various GSEA implementations; uses pandas DataFrames and a hierarchy of Pythonic classes.
file exports (exporting input for GSEA) use low-level numpy functions and are much faster than in pandas
aims to allow researchers to easily compare different implementations of GSEA, and to integrate those in projects which require high-performance GSEA (e.g. massive screening for drug-repositioning)
provides useful utilities for work with GMT files, or gene sets and pathways in general in Python
Example usage
from pandas import read_table
from gsea_api.expression_set import ExpressionSet
from gsea_api.gsea import GSEADesktop
from gsea_api.molecular_signatures_db import GeneSets
reactome_pathways = GeneSets.from_gmt('ReactomePathways.gmt')
gsea = GSEADesktop()
design = ['Disease', 'Disease', 'Disease', 'Control', 'Control', 'Control']
matrix = read_table('expression_data.tsv', index_col='Gene')
result = gsea.run(
# note: contrast() is not necessary in this simple case
ExpressionSet(matrix, design).contrast('Disease', 'Control'),
reactome_pathways,
metric='Signal2Noise',
permutations=1000
)
Where expression_data.tsv is in the following format:
Gene Patient_1 Patient_2 Patient_3 Patient_4 Patient_5 Patient_6 TACC2 0.2 0.1 0.4 0.6 0.7 2.1 TP53 2.3 0.2 2.1 2.0 0.3 0.6
Installation
To install the API use:
pip3 install gsea_api
Installing GSEA from Broad Institute
Login/register on the official GSEA website and download the gsea_3.0.jar file (or a newer version).
Please place the downloaded file in the thirdparty directory.
Installing GSEApy
To use gsea.py please install it with:
pip3 install gseapy
and link its binary to the thirdparty directory
ln -s virtual_environment_path/bin/gseapy thirdparty/gseapy
Use it with:
from gsea_api.gsea import GSEApy
gsea = GSEApy()
Installing cudaGSEA
Please clone this fork of cudaGSEA to thirdparty directory and compile the binary version (using the instructions from this repository):
git clone https://github.com/krassowski/cudaGSEA
or use the original version, which does not implement FDR calculations.
Use it with:
from gsea_api.gsea import cudaGSEA
# CPU implementation can be used with use_cpu=True
gsea = cudaGSEA(fdr='full', use_cpu=False)
Citation
Please also cite the authors of the wrapped tools that you use.
References
The initial version of this code was written for a Master thesis project at Imperial College London.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file gsea_api-0.3.2.tar.gz
.
File metadata
- Download URL: gsea_api-0.3.2.tar.gz
- Upload date:
- Size: 13.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3066b512a5b139155090d02e99dffbba6e0986585f46d91d87464c4fc8d00061 |
|
MD5 | 8304f94c7550213ed296b8e649783194 |
|
BLAKE2b-256 | 693836515c8db1837c9973d9cdb91a9481a3f1aa06e5080ff1979163738ae432 |