Skip to main content

Given the output directory of a QC pipeline and a threshold config file, parse out the desired metrics and evalute them against the thresholds.

Project description

qc-metric-aggregator

Build Status

Parse individual metrics out of a directory of QC results for genomic data and output a report containing the desired metrics and the overall PASS/FAIL status of the sample.

Installation


pip install qc-metric-aggregator


Usage


usage: aggregate-qc-metrics [-h]
                            sample_name metrics_dir output_file threshold_file

positional arguments:
  sample_name     The sample name or id for which the QC metrics apply
  metrics_dir     The directory to search for metric files, often a cromwell
                  run directory
  output_file     File path to store the finalized mertrics TSV
  threshold_file  Path to the yml thresholds file to validate against

optional arguments:
  -h, --help      show this help message and exit

Example invocation:

aggregate-qc-metrics HG00096 /opt/qc/results/HG00096/WholeGenomeSingleSampleQc /opt/qc/scores/qc_results.tsv thresholds.yml

Output formats

Coming soon...


Threshold file

You will need to pass in a YAML file containing pass/fail threshold tests for the metrics you are interested in. The file format consists of a list of objects each containing the following keys:

Key Value Comments
metric_name Name of the metric to check This can be any supported metric and should be the value returned by name.
operator Which operation to use to compare the metric value to the PASS/FAIL threshold <,<=,>,>=, and = are all supported. If you instead specify report the metric will be reported in the final output, but not factored into the PASS/FAIL status.
value The PASS/FAIL threshold to compare the metric value to This field is optional if report is specified for the operator.

An example can be found here


Supported Metrics

Name Description Originating Tool
FREEMIX Freemix VerifyBamId2
Q20_BASES Total bases with Q20 or higher Picard CollectQualityYieldMetrics
MEAN_COVERAGE Haploid Coverage Picard CollectWgsMetrics
PCT_10X Percent coverage at 10x Picard CollectWgsMetrics
PCT_20X Percent coverage at 20x Picard CollectWgsMetrics
PCT_30X Percent coverage at 30x Picard CollectWgsMetrics
PCT_CHIMERAS Percent chimeras (PAIR) Picard CollectAlignmentSummaryMetrics
READ1_PF_MISMATCH_RATE Read 1 base mismatch rate Picard CollectAlignmentSummaryMetrics
READ2_PF_MISMATCH_RATE Read 2 base mismatch rate Picard CollectAlignmentSummaryMetrics
MEDIAN_INSERT_SIZE Library insert size median Picard CollectInsertSizeMetrics
MEDIAN_ABSOLUTE_DEVIATION Library insert size mad Picard CollectInsertSizeMetrics
PERCENT_DUPLICATION Percent duplicate marked reads Picard CollectDuplicateMetrics
MEAN_TARGET_COVERAGE The mean coverage of a target region. Picard CollectHsMetrics
PCT_TARGET_BASES_10X The fraction of all target bases achieving 10X or greater coverage Picard CollectHsMetrics
PCT_TARGET_BASES_20X The fraction of all target bases achieving 20X or greater coverage Picard CollectHsMetrics
PCT_TARGET_BASES_30X The fraction of all target bases achieving 30X or greater coverage Picard CollectHsMetrics

Adding Additional Metrics

To add support for additional metrics you simply need to subclass Metric and register it in AvailableMetrics

Because many QC metrics are output in TSV format, there is a helper class TSVMetric that you can inherent from in addition to Metric that will make that easier. All of the currently supported metrics use this helper, so you should be able to look to them for examples.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qc-metric-aggregator-0.1.3.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

qc_metric_aggregator-0.1.3-py3-none-any.whl (17.7 kB view details)

Uploaded Python 3

File details

Details for the file qc-metric-aggregator-0.1.3.tar.gz.

File metadata

  • Download URL: qc-metric-aggregator-0.1.3.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1.post20200322 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for qc-metric-aggregator-0.1.3.tar.gz
Algorithm Hash digest
SHA256 390507a2284eb67173c4588ac6806c2d477b81f3d64409ff9c62a610479f323a
MD5 fe8bb289a95650b6eddda1266d785ae2
BLAKE2b-256 036a89345a00c0e6ff9dabc44a898de6ec8089d8cca00f198a62db3bd5c9d894

See more details on using hashes here.

File details

Details for the file qc_metric_aggregator-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: qc_metric_aggregator-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 17.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1.post20200322 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for qc_metric_aggregator-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 04e8290b3a236680255fab2ccd7ee36398c1c6e3cbc82c79a28cb85cb815112d
MD5 c60e21e7c4f311bd6293b2acde350209
BLAKE2b-256 6e49bbc8988145be06bddeac7bc843d2432c5aa22ffbacecf3a0bb8f2f6bac1d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page