Skip to main content

Python API to infer missing data in sparsely sampled genotype-phenotype maps.

Project description

GPSeer

Simple software for inferring missing data in sparsely measured genotype-phenotype maps

GPSeer tests

Basic Usage

Install gpseer using pip:

pip install gpseer

To use as a command line, call gpseer on an input .csv file containing genotype-phenotype data.

The API Demo.ipynb demonstrates how to use GPSeer in a Jupyter notebook.

Downloading the example

To get started, use GPSeer's fetch-example command to download an example from its Github repo.

Download the gpseer example and explore the example input data:

# fetch data from Github page.
> gpseer fetch-example

[GPSeer] Downloading files to /examples...
[GPSeer] └──>: 100%|██████████████████| 3/3 [00:00<00:00,  9.16it/s]
[GPSeer] └──> Done!

# Change into the example directory and checkout the files that were downloaded
> cd examples/
> ls

API Demo.ipynb
example-full.csv
example-test.csv
example-train.csv
Generate Dataset.ipynb
genotypes.txt
pfcrt-raw-data.csv

Predicting missing data using ML model.

Estimate the maximum likelihood additive model on the training set and predict all missing genotypes. The predictions will be written to a file named "example-train_predictions.csv".

> gpseer estimate-ml example-train.csv

[GPSeer] Reading data from example-train.csv...
[GPSeer] └──> Done reading data.
[GPSeer] Constructing a model...
[GPSeer] └──> Done constructing model.
[GPSeer] Fitting data...
[GPSeer] └──> Done fitting data.
[GPSeer] Predicting missing data...
[GPSeer] └──> Done predicting.
[GPSeer] Calculating fit statistics...
[GPSeer]

Fit statistics:
---------------

              parameter     value
0         num_genotypes       128
1  num_unique_mutations         8
2   explained_variation  0.985186
3        num_parameters         9
4   num_obs_to_converge   2.82714
5             threshold      None
6          spline_order      None
7     spline_smoothness      None
8       epistasis_order         1


[GPSeer]

Convergence:
------------

  mutation  num_obs  num_obs_above  fold_target  converged
0      F0K       64             64    22.637735       True
1      S1Y       69             69    24.406308       True
2      Q2T       63             63    22.284020       True
3      R3V       70             70    24.760023       True
4      N4D       62             62    21.930306       True
5      A5C       69             69    24.406308       True
6      C6D       65             65    22.991450       True
7      C7A       64             64    22.637735       True


[GPSeer] └──> Done.
[GPSeer] Writing phenotypes to example-train_predictions.csv...
[GPSeer] └──> Done writing predictions!
[GPSeer] Writing plots...
[GPSeer] Writing example-train_correlation-plot.pdf...
[GPSeer] Writing example-train_phenotype-histograms.pdf...
[GPSeer] └──> Done plotting!
[GPSeer] GPSeer finished!

Compute the predictive power of the model by cross-validation

Estimate how well your model is predicting data using the "cross-validate" subcommand. Try the example below where we generate 100 subsets from the data and compute your prediction scores.

> gpseer cross-fit example-test.csv

[GPSeer] Reading data from example-train.csv...
[GPSeer] └──> Done reading data.
[GPSeer] Fitting all data data...
[GPSeer] └──> Done fitting data.
[GPSeer] Sampling the data...
[GPSeer] └──>: 100%|████████████████████| 100/100 [00:03<00:00, 25.90it/s]
[GPSeer] └──> Done sampling data.
[GPSeer] Plotting example-train_cross-validation-plot.pdf...
[GPSeer] └──> Done writing data.
[GPSeer] Writing scores to example-train_cross-validation-scores.csv...
[GPSeer] └──> Done writing data

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gpseer-0.3.3.tar.gz (15.9 kB view details)

Uploaded Source

File details

Details for the file gpseer-0.3.3.tar.gz.

File metadata

  • Download URL: gpseer-0.3.3.tar.gz
  • Upload date:
  • Size: 15.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.6

File hashes

Hashes for gpseer-0.3.3.tar.gz
Algorithm Hash digest
SHA256 352e6b9371666c331369091fa43296d95c72e8f3193f401c2844a221fadbdc8c
MD5 13d666e452b112a1ea20644d0c9f65c3
BLAKE2b-256 2930561d71db5333d7e158737e2b2e09a217a812bbffa2bdba57240330dffe92

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page