Skip to main content

Pandas API for Gene Set Enrichment Analysis in Python (GSEApy, cudaGSEA, GSEA)

Project description

Build Status MIT License DOI

Pandas API for Gene Set Enrichment Analysis in Python (GSEApy, cudaGSEA, GSEA)

  • aims to provide a unified API for various GSEA implementations; uses pandas DataFrames and a hierarchy of Pythonic classes.

  • file exports (exporting input for GSEA) use low-level numpy functions and are much faster than in pandas

  • aims to allow researchers to easily compare different implementations of GSEA, and to integrate those in projects which require high-performance GSEA (e.g. massive screening for drug-repositioning)

  • provides useful utilities for work with GMT files, or gene sets and pathways in general in Python

Example usage

from pandas import read_table
from gsea_api.expression_set import ExpressionSet
from gsea_api.gsea import GSEADesktop
from gsea_api.molecular_signatures_db import GeneSets

reactome_pathways = GeneSets.from_gmt('ReactomePathways.gmt')

gsea = GSEADesktop()

design = ['Disease', 'Disease', 'Disease', 'Control', 'Control', 'Control']
matrix = read_table('expression_data.tsv', index_col='Gene')

result = gsea.run(
    # note: contrast() is not necessary in this simple case
    ExpressionSet(matrix, design).contrast('Disease', 'Control'),
    reactome_pathways,
    metric='Signal2Noise',
    permutations=1000
)

Where expression_data.tsv is in the following format:

Gene    Patient_1   Patient_2   Patient_3   Patient_4   Patient_5   Patient_6
TACC2   0.2 0.1 0.4 0.6 0.7 2.1
TP53    2.3 0.2 2.1 2.0 0.3 0.6

Installation

To install the API use:

pip3 install gsea_api

Installing GSEA from Broad Institute

Login/register on the official GSEA website and download the gsea_3.0.jar file (or a newer version).

Please place the downloaded file in the thirdparty directory.

Installing GSEApy

To use gsea.py please install it with:

pip3 install gseapy

and link its binary to the thirdparty directory

ln -s virtual_environment_path/bin/gseapy thirdparty/gseapy

Use it with:

from gsea_api.gsea import GSEApy

gsea = GSEApy()

Installing cudaGSEA

Please clone this fork of cudaGSEA to thirdparty directory and compile the binary version (using the instructions from this repository):

git clone https://github.com/krassowski/cudaGSEA

or use the original version, which does not implement FDR calculations.

Use it with:

from gsea_api.gsea import cudaGSEA

# CPU implementation can be used with use_cpu=True
gsea = cudaGSEA(fdr='full', use_cpu=False)

Citation

DOI

Please also cite the authors of the wrapped tools that you use.

References

The initial version of this code was written for a Master thesis project at Imperial College London.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gsea_api-0.2.5.tar.gz (13.0 kB view details)

Uploaded Source

File details

Details for the file gsea_api-0.2.5.tar.gz.

File metadata

  • Download URL: gsea_api-0.2.5.tar.gz
  • Upload date:
  • Size: 13.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for gsea_api-0.2.5.tar.gz
Algorithm Hash digest
SHA256 6fb5dd2f58abf07b34e8024499de0f80840a18e44a7a46bd0783b54a0abf030a
MD5 d6560eace5268c478fe378b5e8f7f6bf
BLAKE2b-256 9410829529ba4a72d54612e32e418a7fe2ebb4b7a5032a96cb334b5aaad6eb84

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page