Skip to main content

Generates consistent PSSM and/or PDB files for protein-protein complexes

Project description

PSSMGen

Fair-software.nl Recommendations Badges
1. Code Repository GitHub URL
  GitHub
2. License License
3. Community Registry Research Software Directory
  PyPI
4. Enable Citation DOI
5. Code Quality Checklist CII best practices
Code Analysis Codacy Badge

PSSMGen: Generates Consistent PSSM and/or PDB Files for Protein-Protein Complexes

Install

  1. Make sure BLAST is installed and its database is available on your machine. Otherwise, install BLAST and download its databases by following the BLAST guide. To calculate PSSM, the recommended database is the non-redundant protein sequences nr (i.e. nr.*.tar.gz files from the ftp site).
  2. Install the PSSMgen by pip install PSSMGen.

Requirements for file structures and names

PSSMGen is geared toward computing the pssm files for all models of a particular protein-protein complex.

File structures

This tool assumes your files have following structure:

 workdir
 |_ pdb
 |_ fasta
 |_ pssm_raw
 |_ pssm
 |_ pdb_nonmatch
  • workdir is your working directory for one specific protein-protein complex.
  • pdb folder contains the PDB files (consistent PDB files)
  • fasta folder contains the protein sequence FASTA files. The code can generate the FASTA files by extracting sequences from the pdb file , or you can manually create this folder and put customised FASTA files there.
  • pssm_raw folder stores the PSSM files. The code can automatically generate them, or you can manually create this folder and put customised PSSM files there.
  • pssm folder stores consistent PSSM files, whose sequences are aligned with those of PDB files. This folder and its files are created automatically.
  • pdb_nonmatch folder stores the inconsistent PDB files, while the related consistent PDB files are in the pdb folder. This folder and its files are created automatically.

File names

The code assumes you follow the naming rules for different file types:

  • PDB files: caseID_*.chainID.pdb
  • FASTA files: caseID.chainID.fasta
  • PSSM files: caseID.chainID.pssm, caseID_*.chainID.pdb.pssm

Examples

Here are some examples for the complex 7CEI. The file structure and input files should look like

7CEI
├── pdb
│   ├── 7CEI_1w.pdb
│   ├── 7CEI_2w.pdb
│   └── 7CEI_3w.pdb
└── fasta
    ├── 7CEI.A.fasta
    └── 7CEI.B.fasta

Calculate PSSM with given FASTA files

from pssmgen import PSSM

# initiate the PSSM object
gen = PSSM(work_dir='7CEI')

# set psiblast executable, database and other psiblast parameters (here shows the defaults)
gen.configure(blast_exe='/home/software/blast/bin/psiblast',
            database='/data/DBs/blast_dbs/nr_v20180204/nr',
            num_threads = 4, evalue=0.0001, comp_based_stats='T',
            max_target_seqs=2000, num_iterations=3, outfmt=7,
            save_each_pssm=True, save_pssm_after_last_round=True)

# generates raw PSSM files by running BLAST with fasta files
gen.get_pssm(fasta_dir='fasta', out_dir='pssm_raw', run=True, save_all_psiblast_output=True)

The code will automatically create pssm_raw folder to store the generated PSSM files.

Map PSSM files to PDB files to get consistent PSSM and PDB files

After getting the raw PSSMs from last example, we could map them to PDB files to get consistent PSSM and PDB files as following:

# map PSSM and PDB to get consisitent/mapped PSSM files
gen.map_pssm(pssm_dir='pssm_raw', pdb_dir='pdb', out_dir='pssm', chain=('A','B'))

# write consistent/mapped PDB files and move inconsistent ones to another folder for backup
gen.get_mapped_pdb(pdbpssm_dir='pssm', pdb_dir='pdb', pdbnonmatch_dir='pdb_nonmatch')

The code will automatically create pssm and pdb_nonmatch folders and related files.

Extract FASTA files from PDB file

If the FASTA files are not provided, you can also generate them from the PDB file.

The file structure and input files should look like

7CEI
└── pdb
    ├── 7CEI_1w.pdb
    ├── 7CEI_2w.pdb
    └── 7CEI_3w.pdb
# initiate the PSSM object
gen = PSSM('7CEI')

# extract FASTA file from the reference pdb file.
# if `pdbref` is not set, the code will randomly select one pdb as reference.
gen.get_fasta(pdb_dir='pdb', pdbref='7CEI_1w.pdb', chain=('A','B'), out_dir='fasta')

The code will automatically create fasta and pssm_raw folders for fasta files and raw pssm files, repsectively.

Use existing PSSM files to get consistent PSSM and PDB files

You can provide raw PSSM files intead of calculating them.

The file structure and input files should look like

7CEI
├── pdb
│   ├── 7CEI_1w.pdb
│   ├── 7CEI_2w.pdb
│   └── 7CEI_3w.pdb
└── pssm_raw
    ├── 7CEI.A.pssm
    └── 7CEI.B.pssm
from pssmgen import PSSM

# initiate the PSSM object
gen = PSSM('7CEI')

# map PSSM and PDB to get consisitent files
gen.map_pssm()

# write consistent files and move
gen.get_mapped_pdb()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PSSMGen-0.1.2.tar.gz (16.1 kB view details)

Uploaded Source

Built Distribution

PSSMGen-0.1.2-py3-none-any.whl (16.5 kB view details)

Uploaded Python 3

File details

Details for the file PSSMGen-0.1.2.tar.gz.

File metadata

  • Download URL: PSSMGen-0.1.2.tar.gz
  • Upload date:
  • Size: 16.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.3

File hashes

Hashes for PSSMGen-0.1.2.tar.gz
Algorithm Hash digest
SHA256 76ada6b4c099fd2cbc3aa343bfbc50d2e015b8ff0c5fa2c97534c9fae91fa239
MD5 b4b1095a1db65abb2294e75a5e4cc7ca
BLAKE2b-256 9fdeae64fa409da51beb3a0c6827d147972d1cfa48ca5b5bc230f675a721d2c2

See more details on using hashes here.

File details

Details for the file PSSMGen-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: PSSMGen-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 16.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.3

File hashes

Hashes for PSSMGen-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 bb7698c0b7dcc70cd36cc7853330b609fd0a64b595250331cc700cf5c89cea40
MD5 5f720e256c41dcac3a37eb6ae23a8294
BLAKE2b-256 2c005c025281687d9c2573e5f72b23a543cfab3d5aad99e3da24adb9065cf662

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page