Given the output directory of a QC pipeline and a threshold config file, parse out the desired metrics and evalute them against the thresholds.
Project description
qc-metric-aggregator
Parse individual metrics out of a directory of QC results for genomic data and output a report containing the desired metrics and the overall PASS/FAIL status of the sample.
Installation
pip install qc-metric-aggregator
Usage
usage: aggregate-qc-metrics [-h]
sample_name metrics_dir output_file threshold_file
positional arguments:
sample_name The sample name or id for which the QC metrics apply
metrics_dir The directory to search for metric files, often a cromwell
run directory
output_file File path to store the finalized mertrics TSV
threshold_file Path to the yml thresholds file to validate against
optional arguments:
-h, --help show this help message and exit
Example invocation:
aggregate-qc-metrics HG00096 /opt/qc/results/HG00096/WholeGenomeSingleSampleQc /opt/qc/scores/qc_results.tsv thresholds.yml
Output formats
Coming soon...
Threshold file
You will need to pass in a YAML file containing pass/fail threshold tests for the metrics you are interested in. The file format consists of a list of objects each containing the following keys:
Key | Value | Comments |
---|---|---|
metric_name |
Name of the metric to check | This can be any supported metric and should be the value returned by name . |
operator |
Which operation to use to compare the metric value to the PASS/FAIL threshold | < ,<= ,> ,>= , and = are all supported. If you instead specify report the metric will be reported in the final output, but not factored into the PASS/FAIL status. |
value |
The PASS/FAIL threshold to compare the metric value to | This field is optional if report is specified for the operator . |
An example can be found here
Supported Metrics
Name | Description | Originating Tool |
---|---|---|
FREEMIX | Freemix | VerifyBamId2 |
Q20_BASES | Total bases with Q20 or higher | Picard CollectQualityYieldMetrics |
MEAN_COVERAGE | Haploid Coverage | Picard CollectWgsMetrics |
PCT_10X | Percent coverage at 10x | Picard CollectWgsMetrics |
PCT_20X | Percent coverage at 20x | Picard CollectWgsMetrics |
PCT_30X | Percent coverage at 30x | Picard CollectWgsMetrics |
PCT_CHIMERAS | Percent chimeras (PAIR) | Picard CollectAlignmentSummaryMetrics |
READ1_PF_MISMATCH_RATE | Read 1 base mismatch rate | Picard CollectAlignmentSummaryMetrics |
READ2_PF_MISMATCH_RATE | Read 2 base mismatch rate | Picard CollectAlignmentSummaryMetrics |
MEDIAN_INSERT_SIZE | Library insert size median | Picard CollectInsertSizeMetrics |
MEDIAN_ABSOLUTE_DEVIATION | Library insert size mad | Picard CollectInsertSizeMetrics |
PERCENT_DUPLICATION | Percent duplicate marked reads | Picard CollectDuplicateMetrics |
MEAN_TARGET_COVERAGE | The mean coverage of a target region. | Picard CollectHsMetrics |
PCT_TARGET_BASES_10X | The fraction of all target bases achieving 10X or greater coverage | Picard CollectHsMetrics |
PCT_TARGET_BASES_20X | The fraction of all target bases achieving 20X or greater coverage | Picard CollectHsMetrics |
PCT_TARGET_BASES_30X | The fraction of all target bases achieving 30X or greater coverage | Picard CollectHsMetrics |
Adding Additional Metrics
To add support for additional metrics you simply need to subclass Metric and register it in AvailableMetrics
Because many QC metrics are output in TSV format, there is a helper class TSVMetric that you can inherent from in addition to Metric
that will make that easier. All of the currently supported metrics use this helper, so you should be able to look to them for examples.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file qc-metric-aggregator-0.1.3.tar.gz
.
File metadata
- Download URL: qc-metric-aggregator-0.1.3.tar.gz
- Upload date:
- Size: 9.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1.post20200322 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 390507a2284eb67173c4588ac6806c2d477b81f3d64409ff9c62a610479f323a |
|
MD5 | fe8bb289a95650b6eddda1266d785ae2 |
|
BLAKE2b-256 | 036a89345a00c0e6ff9dabc44a898de6ec8089d8cca00f198a62db3bd5c9d894 |
File details
Details for the file qc_metric_aggregator-0.1.3-py3-none-any.whl
.
File metadata
- Download URL: qc_metric_aggregator-0.1.3-py3-none-any.whl
- Upload date:
- Size: 17.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1.post20200322 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 04e8290b3a236680255fab2ccd7ee36398c1c6e3cbc82c79a28cb85cb815112d |
|
MD5 | c60e21e7c4f311bd6293b2acde350209 |
|
BLAKE2b-256 | 6e49bbc8988145be06bddeac7bc843d2432c5aa22ffbacecf3a0bb8f2f6bac1d |