Pandas for phylogenetics
Project description
Bringing the Pandas DataFrame
to phylogenetics.
PhyloPandas provides a Pandas-like interface for reading sequence and phylogenetic tree data into pandas DataFrames. This enables easy manipulation of phylogenetic data using familiar Python/Pandas functions. Finally, phylogenetics for humans!
How does it work?
Don't worry, we didn't reinvent the wheel. PhyloPandas is simply a DataFrame (great for human-accessible data storage) interface on top of Biopython (great for parsing/writing sequence data) and DendroPy (great for reading tree data).
PhyloPandas does two things:
- It offers new
read
functions to read sequence/tree data directly into a DataFrame. - It attaches a new
phylo
accessor to the Pandas DataFrame. This accessor provides writing methods for sequencing/tree data (powered by Biopython and dendropy).
Basic Usage
Sequence data:
Read in a sequence file.
import phylopandas as ph
df1 = ph.read_fasta('sequences.fasta')
df2 = ph.read_phylip('sequences.phy')
Write to various sequence file formats.
df1.phylo.to_clustal('sequences.clustal')
Convert between formats.
# Read a format.
df = ph.read_fasta('sequences.fasta')
# Write to a different format.
df.phylo.to_phylip('sequences.phy')
Tree data:
Read newick tree data
df = ph.read_newick('tree.newick')
Visualize the phylogenetic data (powered by phylovega).
df.phylo.display(
height=500,
)
Contributing
If you have ideas for the project, please share them on the project's Gitter chat.
It's easy to create new read/write functions and methods for PhyloPandas. If you have a format you'd like to add, please submit PRs! There are many more formats in Biopython that I haven't had the time to add myself, so please don't be afraid to add them! I thank you ahead of time!
Testing
PhyloPandas includes a small pytest suite. Run these tests from base directory.
$ cd phylopandas
$ pytest
Install
Install from PyPI:
pip install phylopandas
Install from source:
git clone https://github.com/Zsailer/phylopandas
cd phylopandas
pip install -e .
Dependencies
- BioPython: Library for managing and manipulating biological data.
- DendroPy: Library for phylogenetic scripting, simulation, data processing and manipulation
- Pandas: Flexible and powerful data analysis / manipulation library for Python
- pandas_flavor: Flavor pandas objects with new accessors using pandas' new register API (with backwards compatibility).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file phylopandas-0.8.0.tar.gz
.
File metadata
- Download URL: phylopandas-0.8.0.tar.gz
- Upload date:
- Size: 14.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1efc4b81ce745794490f6f6144114f1dc8102764303c58935017119cdcaaa7d2 |
|
MD5 | 04ec1c1d106fe515e88329af6ddbe44d |
|
BLAKE2b-256 | 4ad83eecd18d4b995b6bd9d8488c34731f3654093b08158a8555df7098df6494 |
Provenance
File details
Details for the file phylopandas-0.8.0-py2.py3-none-any.whl
.
File metadata
- Download URL: phylopandas-0.8.0-py2.py3-none-any.whl
- Upload date:
- Size: 24.5 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9f517270296731934ab9285067735401b531571b19103904a63ea5e3cae29200 |
|
MD5 | 90b1cc8e3750e88f8bd0928c8eab5d2c |
|
BLAKE2b-256 | b50a3341f46b96425a0e5e7b64e8052b9785bbfdfbeac84426b13ccc2712a5a8 |