Skip to main content

GPU-based QTL mapper

Project description

tensorQTL

tensorQTL is a GPU-based QTL mapper, enabling ~200-300 fold faster cis- and trans-QTL mapping compared to CPU-based implementations.

If you use tensorQTL in your research, please cite the following paper: Taylor-Weiner, Aguet, et al., bioRxiv, 2019.

Empirical beta-approximated p-values are computed as described in FastQTL (Ongen et al., 2016).

Install

Run the following commands to install tensorQTL:

$ git clone git@github.com:broadinstitute/tensorqtl.git
$ cd tensorqtl
# set up virtual environment and install
$ virtualenv venv
$ source venv/bin/activate
(venv)$ pip install -r install/requirements.txt .

Requirements

tensorQTL requires an environment configured with a GPU. Instructions for setting up a virtual machine on Google Cloud Platform are provided here.

Input formats

tensorQTL requires three input files: genotypes, phenotypes, and covariates. Phenotypes must be provided in BED format (phenotypes x samples), and covariates as a text file (covariates x samples). Both are in the format used by FastQTL. Genotypes must currently be in PLINK format, and can be converted as follows:

plink2 --make-bed \
    --output-chr chrM \
    --vcf ${plink_prefix_path}.vcf.gz \
    --out ${plink_prefix_path}

Examples

For examples illustrating cis- and trans-QTL mapping, please see tensorqtl_examples.ipynb.

Running tensorQTL from the command line

This section describes how to run tensorQTL from the command line. For a full list of options, run

python3 -m tensorqtl --help

cis-QTL mapping

Phenotype-level summary statistics with empirical p-values:

python3 -m tensorqtl ${plink_prefix_path} ${expression_bed} ${prefix} \
    --covariates ${covariates_file} \
    --mode cis

All variant-phenotype associations:

python3 -m tensorqtl ${plink_prefix_path} ${expression_bed} ${prefix} \
    --covariates ${covariates_file} \
    --mode cis_nominal

This will generate a parquet file for each chromosome. These files can be read using pandas:

import pandas as pd
df = pd.read_parquet(file_name)

Conditionally independent cis-QTL (as described in GTEx Consortium, 2017):

python3 -m tensorqtl ${plink_prefix_path} ${expression_bed} ${prefix} \
    --covariates ${covariates_file} \
    --cis_results ${cis_results_file} \
    --mode cis_independent

trans-QTL mapping

python3 -m tensorqtl ${plink_prefix_path} ${expression_bed} ${prefix} \
    --covariates ${covariates_file} \
    --mode trans

For trans-QTL mapping, tensorQTL generates sparse output by default (associations with p-value < 1e-5). cis-associations are filtered out. The output is in parquet format, with four columns: phenotype_id, variant_id, pval, maf.

Running tensorQTL as a Python module

TensorQTL can also be run as a module to more efficiently run multiple analyses:

import pandas as pd
import tensorqtl
from tensorqtl import genotypeio, cis, trans

Loading input files

Load phenotypes and covariates:

phenotype_df, phenotype_pos_df = tensorqtl.read_phenotype_bed(phenotype_bed_file)
covariates_df = pd.read_csv(covariates_file, sep='\t', index_col=0).T  # samples x covariates

Genotypes can be loaded as follows, where plink_prefix_path is the path to the VCF in PLINK format:

pr = genotypeio.PlinkReader(plink_prefix_path)
# load genotypes and variants into data frames
genotype_df = pd.DataFrame(pr.get_all_genotypes(), index=pr.bim['snp'], columns=pr.fam['iid'])
variant_df = pr.bim.set_index('snp')[['chrom', 'pos']]

To save memory when using genotypes for a subset of samples, you can specify the samples as follows (this is not strictly necessary, since tensorQTL will select the relevant samples from genotype_df otherwise):

pr = genotypeio.PlinkReader(plink_prefix_path, select_samples=phenotype_df.columns)

cis-QTL mapping

cis_df = cis.map_cis(genotype_df, variant_df, phenotype_df, phenotype_pos_df, covariates_df)
tensorqtl.calculate_qvalues(cis_df, qvalue_lambda=0.85)

trans-QTL mapping

trans_df = trans.map_trans(genotype_df, phenotype_df, covariates_df, return_sparse=True)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tensorqtl-1.0.0.tar.gz (24.7 kB view details)

Uploaded Source

Built Distribution

tensorqtl-1.0.0-py3-none-any.whl (28.4 kB view details)

Uploaded Python 3

File details

Details for the file tensorqtl-1.0.0.tar.gz.

File metadata

  • Download URL: tensorqtl-1.0.0.tar.gz
  • Upload date:
  • Size: 24.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.21.0 setuptools/41.2.0 requests-toolbelt/0.8.0 tqdm/4.32.1 CPython/3.6.1

File hashes

Hashes for tensorqtl-1.0.0.tar.gz
Algorithm Hash digest
SHA256 efb047e9956e76dae0415d21d5f95f27a1ab0fa9cf8189f63ab9b84d3eda10a7
MD5 ae1ceaa05a4ece495feb101266169238
BLAKE2b-256 7e28f19640a8a8f22b1b9661a13f1ba7405fd7ccf6df77df62c076251618ad9f

See more details on using hashes here.

File details

Details for the file tensorqtl-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: tensorqtl-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 28.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.21.0 setuptools/41.2.0 requests-toolbelt/0.8.0 tqdm/4.32.1 CPython/3.6.1

File hashes

Hashes for tensorqtl-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 efec076d3bc943fc6fc6272f60ebb3ba94826fb93d63d16ea897ac1e0f30e193
MD5 3b9fc4a66fc8ca59ebdb1f2b9e19de85
BLAKE2b-256 9c93e4f537a17b124b648d4076adc9a64d4abf42e26312d573d0410da30a4263

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page