GPU-based QTL mapper
Project description
tensorQTL
tensorQTL is a GPU-based QTL mapper, enabling ~200-300 fold faster cis- and trans-QTL mapping compared to CPU-based implementations.
If you use tensorQTL in your research, please cite the following paper: Taylor-Weiner, Aguet, et al., Genome Biol. 20:228, 2019.
Empirical beta-approximated p-values are computed as described in FastQTL (Ongen et al., 2016).
Install
You can install tensorQTL using pip:
pip3 install tensorqtl
or directly from this repository:
$ git clone git@github.com:broadinstitute/tensorqtl.git
$ cd tensorqtl
# set up virtual environment and install
$ virtualenv venv
$ source venv/bin/activate
(venv)$ pip install -r install/requirements.txt .
Requirements
tensorQTL requires an environment configured with a GPU. Instructions for setting up a virtual machine on Google Cloud Platform are provided here.
Input formats
tensorQTL requires three input files: genotypes, phenotypes, and covariates. Phenotypes must be provided in BED format (phenotypes x samples), and covariates as a text file (covariates x samples). Both are in the format used by FastQTL. Genotypes must currently be in PLINK format, and can be converted as follows:
plink2 --make-bed \
--output-chr chrM \
--vcf ${plink_prefix_path}.vcf.gz \
--out ${plink_prefix_path}
Examples
For examples illustrating cis- and trans-QTL mapping, please see tensorqtl_examples.ipynb.
Running tensorQTL from the command line
This section describes how to run tensorQTL from the command line. For a full list of options, run
python3 -m tensorqtl --help
cis-QTL mapping
Phenotype-level summary statistics with empirical p-values:
python3 -m tensorqtl ${plink_prefix_path} ${expression_bed} ${prefix} \
--covariates ${covariates_file} \
--mode cis
All variant-phenotype associations:
python3 -m tensorqtl ${plink_prefix_path} ${expression_bed} ${prefix} \
--covariates ${covariates_file} \
--mode cis_nominal
This will generate a parquet file for each chromosome. These files can be read using pandas
:
import pandas as pd
df = pd.read_parquet(file_name)
Conditionally independent cis-QTL (as described in GTEx Consortium, 2017):
python3 -m tensorqtl ${plink_prefix_path} ${expression_bed} ${prefix} \
--covariates ${covariates_file} \
--cis_results ${cis_results_file} \
--mode cis_independent
trans-QTL mapping
python3 -m tensorqtl ${plink_prefix_path} ${expression_bed} ${prefix} \
--covariates ${covariates_file} \
--mode trans
For trans-QTL mapping, tensorQTL generates sparse output by default (associations with p-value < 1e-5). cis-associations are filtered out. The output is in parquet format, with four columns: phenotype_id, variant_id, pval, maf.
Running tensorQTL as a Python module
TensorQTL can also be run as a module to more efficiently run multiple analyses:
import pandas as pd
import tensorqtl
from tensorqtl import genotypeio, cis, trans
Loading input files
Load phenotypes and covariates:
phenotype_df, phenotype_pos_df = tensorqtl.read_phenotype_bed(phenotype_bed_file)
covariates_df = pd.read_csv(covariates_file, sep='\t', index_col=0).T # samples x covariates
Genotypes can be loaded as follows, where plink_prefix_path
is the path to the VCF in PLINK format:
pr = genotypeio.PlinkReader(plink_prefix_path)
# load genotypes and variants into data frames
genotype_df = pd.DataFrame(pr.get_all_genotypes(), index=pr.bim['snp'], columns=pr.fam['iid'])
variant_df = pr.bim.set_index('snp')[['chrom', 'pos']]
To save memory when using genotypes for a subset of samples, you can specify the samples as follows (this is not strictly necessary, since tensorQTL will select the relevant samples from genotype_df
otherwise):
pr = genotypeio.PlinkReader(plink_prefix_path, select_samples=phenotype_df.columns)
cis-QTL mapping: permutations
cis_df = cis.map_cis(genotype_df, variant_df, phenotype_df, phenotype_pos_df, covariates_df)
tensorqtl.calculate_qvalues(cis_df, qvalue_lambda=0.85)
cis-QTL mapping: summary statistics for all variant-phenotype pairs
cis.map_nominal(genotype_df, variant_df, phenotype_df, phenotype_pos_df,
covariates_df, prefix, output_dir='.')
cis-QTL mapping: conditionally independent QTLs
This requires the output from the permutations step (map_cis
) above.
indep_df = cis.map_independent(genotype_df, variant_df, cis_df,
phenotype_df, phenotype_pos_df, covariates_df)
cis-QTL mapping: interactions
Instead of mapping the standard linear model (p ~ g), includes an interaction term (p ~ g + i + gi) and returns full summary statistics for this model. The interaction term is a pd.Series
mapping sample ID to interaction value.
With the run_eigenmt=True
option, eigenMT-adjusted p-values are computed.
cis.map_nominal(genotype_df, variant_df, phenotype_df, phenotype_pos_df, covariates_df, prefix,
interaction_s=interaction_s, maf_threshold_interaction=0.05,
group_s=None, run_eigenmt=True, output_dir='.')
trans-QTL mapping
trans_df = trans.map_trans(genotype_df, phenotype_df, covariates_df, return_sparse=True)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tensorqtl-1.0.4.tar.gz
.
File metadata
- Download URL: tensorqtl-1.0.4.tar.gz
- Upload date:
- Size: 39.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.21.0 setuptools/49.2.1 requests-toolbelt/0.8.0 tqdm/4.43.0 CPython/3.6.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bfbd629d5e9e323f440dc528287a14540bf113fbea80373c955d131eea7b9d43 |
|
MD5 | 7ce5929d055b24c729163504a6384162 |
|
BLAKE2b-256 | 4e8d262ab159d32d9ea68954412983781bbf644651b088eb9ca4fb4b256f2d8d |
File details
Details for the file tensorqtl-1.0.4-py3-none-any.whl
.
File metadata
- Download URL: tensorqtl-1.0.4-py3-none-any.whl
- Upload date:
- Size: 35.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.21.0 setuptools/49.2.1 requests-toolbelt/0.8.0 tqdm/4.43.0 CPython/3.6.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9c243b36d8ededee7e250941ea8b55a48d02ac4bf7b5978fd3b521fd12189eac |
|
MD5 | c3aed1d6eea0c56ec00521809152a0b1 |
|
BLAKE2b-256 | 1c9ba781f8f8ead37fc4ebdeeae0e06ae0437546b6657b3550963840e8a222c5 |