Python API to infer missing data in sparsely sampled genotype-phenotype maps.
Project description
GPSeer
Simple software for inferring missing data in sparsely measured genotype-phenotype maps
Basic Usage
Install gpseer using pip:
pip install gpseer
To use as a command line, call gpseer
on an input .csv
file containing genotype-phenotype data.
The API Demo.ipynb demonstrates how to use GPSeer in a Jupyter notebook.
Downloading the example
To get started, use GPSeer's fetch-example
command to download an example from its Github repo.
Download the gpseer example and explore the example input data:
# fetch data from Github page.
> gpseer fetch-example
[GPSeer] Downloading files to /examples...
[GPSeer] └──>: 100%|██████████████████| 3/3 [00:00<00:00, 9.16it/s]
[GPSeer] └──> Done!
# Change into the example directory and checkout the files that were downloaded
> cd examples/
> ls
API Demo.ipynb
example-full.csv
example-test.csv
example-train.csv
Generate Dataset.ipynb
genotypes.txt
pfcrt-raw-data.csv
Predicting missing data using ML model.
Estimate the maximum likelihood additive model on the training set and predict all missing genotypes. The predictions will be written to a file named "example-train_predictions.csv"
.
> gpseer estimate-ml example-train.csv
[GPSeer] Reading data from example-train.csv...
[GPSeer] └──> Done reading data.
[GPSeer] Constructing a model...
[GPSeer] └──> Done constructing model.
[GPSeer] Fitting data...
[GPSeer] └──> Done fitting data.
[GPSeer] Predicting missing data...
[GPSeer] └──> Done predicting.
[GPSeer] Calculating fit statistics...
[GPSeer]
Fit statistics:
---------------
parameter value
0 num_genotypes 128
1 num_unique_mutations 8
2 explained_variation 0.985186
3 num_parameters 9
4 num_obs_to_converge 2.82714
5 threshold None
6 spline_order None
7 spline_smoothness None
8 epistasis_order 1
[GPSeer]
Convergence:
------------
mutation num_obs num_obs_above fold_target converged
0 F0K 64 64 22.637735 True
1 S1Y 69 69 24.406308 True
2 Q2T 63 63 22.284020 True
3 R3V 70 70 24.760023 True
4 N4D 62 62 21.930306 True
5 A5C 69 69 24.406308 True
6 C6D 65 65 22.991450 True
7 C7A 64 64 22.637735 True
[GPSeer] └──> Done.
[GPSeer] Writing phenotypes to example-train_predictions.csv...
[GPSeer] └──> Done writing predictions!
[GPSeer] Writing plots...
[GPSeer] Writing example-train_correlation-plot.pdf...
[GPSeer] Writing example-train_phenotype-histograms.pdf...
[GPSeer] └──> Done plotting!
[GPSeer] GPSeer finished!
Compute the predictive power of the model by cross-validation
Estimate how well your model is predicting data using the "cross-validate" subcommand. Try the example below where we generate 100 subsets from the data and compute your prediction scores.
> gpseer cross-fit example-test.csv
[GPSeer] Reading data from example-train.csv...
[GPSeer] └──> Done reading data.
[GPSeer] Fitting all data data...
[GPSeer] └──> Done fitting data.
[GPSeer] Sampling the data...
[GPSeer] └──>: 100%|████████████████████| 100/100 [00:03<00:00, 25.90it/s]
[GPSeer] └──> Done sampling data.
[GPSeer] Plotting example-train_cross-validation-plot.pdf...
[GPSeer] └──> Done writing data.
[GPSeer] Writing scores to example-train_cross-validation-scores.csv...
[GPSeer] └──> Done writing data
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file gpseer-0.3.tar.gz
.
File metadata
- Download URL: gpseer-0.3.tar.gz
- Upload date:
- Size: 15.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a43cbe776e6132ec6c42275eea7da4cee3ddf8b05205f02193acd2b56ad02c9e |
|
MD5 | 10e6a9e8571a9b05117c2b8d90714bcd |
|
BLAKE2b-256 | 47b264342bac29db3601bad01470395110e8c256a45bf1c6f8b6b52e76d43e59 |