Skip to main content

A small Python utility for calculating statistics per genome position based on pileups from a SAM or BAM file.

Project description

pysamstats
==========

A small Python utility for calculating statistics per genome position
based on pileups from a SAM or BAM file.

* Source: https://gihub.com/alimanfoo/pysamstats
* Download: http://pypi.python.org/pypi/pysamstats (TODO)

Installation
------------

```
$ pip install pysamstats
```

N.B., pysamstats depends on [pysam](http://code.google.com/p/pysam/)
and [numpy](http://www.numpy.org/). These *should* be install
automatically if you run the command above, but if you have any
problems, you might try installing pysam and numpy separately first.

Alternatively, clone the git repo and build in-place (requires
cython):

```
$ git clone git://github.com/alimanfoo/pysamstats.git
$ cd pysamstats
$ python setup_dev.py build_ext --inplace
```

Usage
-----

From the command line:

```
$ pysamstats --help
Usage: pysamstats [options] FILE

Calculate statistics per genome position based on pileups from a SAM or BAM
file and print them to stdout.

Options:
-h, --help show this help message and exit
-t TYPE, --type=TYPE type of statistics to print: coverage,
coverage_strand, coverage_ext, coverage_ext_strand,
coverage_normed, coverage_gc, coverage_normed_gc,
variation, variation_strand, tlen, tlen_strand, mapq,
mapq_strand, baseq, baseq_strand, baseq_ext,
baseq_ext_strand
-c CHROMOSOME, --chromosome=CHROMOSOME
chromosome name
-s START, --start=START
start position (1-based)
-e END, --end=END end position (1-based)
-z, --zero-based use zero-based coordinates (default is false, i.e.,
use one-based coords)
-f FASTA, --fasta=FASTA
reference sequence file, only required for some
statistics
--gc-window-length=N size of window to use for %GC calculations [300]
--gc-window-offset=N window offset to use for deciding which genome
position to report %GC calculations against [150]
-o, --omit-header omit header row from output
-p N, --progress=N report progress every N rows

Supported statistics types:

* coverage - number of reads aligned to each genome position
(total and properly paired)
* coverage_strand - as coverage but with forward/reverse strand counts
* coverage_ext - various additional coverage metrics, including
coverage for reads not properly paired (mate
unmapped, mate on other chromosome, ...)
* coverage_ext_strand - as coverage_ext but with forward/reverse strand counts
* coverage_normed - depth of coverage normalised by median or mean
* coverage_gc - as coverage but also includes a column for %GC
* coverage_normed_gc - as coverage_normed but also includes columns for normalisation
by %GC
* variation - numbers of matches, mismatches, deletions,
insertions, etc.
* variation_strand - as variation but with forward/reverse strand counts
* tlen - insert size statistics
* tlen_strand - as tlen but with statistics by forward/reverse strand
* mapq - mapping quality statistics
* mapq_strand - as mapq but with statistics by forward/reverse strand
* baseq - baseq quality statistics
* baseq_strand - as baseq but with statistics by forward/reverse strand
* baseq_ext - extended base quality statistics, including qualities
of bases matching and mismatching reference
* baseq_ext_strand - as baseq_ext but with statistics by forward/reverse strand

Examples:

pysamstats --type coverage example.bam > example.coverage.txt
pysamstats --type coverage --chromosome Pf3D7_v3_01 --start 100000 --end 200000 example.bam > example.coverage.txt
```

From Python:

```python
import pysam
import pysamstats

mybam = pysam.Samfile('/path/to/your/bamfile.bam')
for rec in pysamstats.stat_coverage(mybam, chrom='Pf3D7_01_v3', start=10000, end=20000):
print rec['chrom'], rec['pos'], rec['reads_all'], rec['reads_pp']
...

```

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysamstats-0.4.4.tar.gz (224.0 kB view details)

Uploaded Source

File details

Details for the file pysamstats-0.4.4.tar.gz.

File metadata

  • Download URL: pysamstats-0.4.4.tar.gz
  • Upload date:
  • Size: 224.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for pysamstats-0.4.4.tar.gz
Algorithm Hash digest
SHA256 d2c42cec3330ae6a290deb0e26da4b1f23c3538fc7856d02e0a78adc1fb9eab4
MD5 c5223265561b70486c59c4b0cd325429
BLAKE2b-256 373403054341264330b4368a622025d82e6d229c97ffca4989ab8ec18e443687

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page