A tool for ranking potential targets for a given disease
Project description
This is a tool for therapeutic target prioritization using network representation learning.
Installation
Download this repository, go to the directory it resides and run:
$ git clone https://github.com/phanein/deepwalk.git
$ cd deepwalk
$ pip install .
$ cd ..
$ # Install GAT2VEC, which depends on DeepWalk
$ git clone https://github.com/ozlemmuslu/GAT2VEC.git gat2vec
$ cd gat2vec
$ pip install .
$ cd ..
$ # Actually install GuiltyTargets
$ git clone https://github.com/guiltytargets/guiltytargets.git
$ cd guiltytargets
$ pip install -e .
Usage
After that, you can use it as a library in Python
import guiltytargets
guiltytargets.run(
input_directory,
targets_path,
ppi_graph_path,
dge_path,
auc_output_path,
probs_output_path,
max_adj_p=max_padj,
max_log2_fold_change=lfc_cutoff * -1,
min_log2_fold_change=lfc_cutoff,
entrez_id_header=entrez_id_name,
log2_fold_change_header=log_fold_change_name,
adj_p_header=adjusted_p_value_name,
base_mean_header=base_mean_name,
entrez_delimiter=split_char,
ppi_edge_min_confidence=confidence_cutoff,
)
This will create files in paths auc_output_path and probs_output_path, where the former shows the AUC values of cross validation and the latter shows the predicted targets.
The parameters are explained below. A use case can be found under https://github.com/GuiltyTargets/reproduction
INPUT FILES
There are 3 files which are necessary to run this program. All input files should be found under input_directory
ppi_graph_path: A path to a file containing a protein-protein interaction network in the format of:
source_entrez_id
target_entrez_id
confidence
216
216
0.76
3679
1134
0.73
55607
71
0.65
5552
960
0.63
2886
2064
0.90
5058
2064
0.73
1742
2064
0.87
An example of such a network can be found [here](http://cbdm-01.zdv.uni-mainz.de/~mschaefer/hippie/download.php)
dge_path: A path to a file containing an experiment, in tsv format. Rows show individual entries, columns are the values of the following properties:
Base mean
Log fold change
Adjusted p value
Entrez id
The file may contain other columns too, but the indices and names of the above columns must be entered to the configuration file.
targets_path: A path to a file containing a list of Entrez ids of known targets, in the format of
… code-block:: sh
1742 3996 150 152 151
OPTIONS
The options that should be set are:
max_adj_p: Maximum value for adjusted p-value for a gene to be considered differentially expressed.
max_log2_fold_change: Maximum value for log2 fold change for a gene to be considered differentially expressed
min_log2_fold_change: Minimum value for log2 fold change for a gene to be considered differentially expressed
ppi_edge_min_confidence: Minimum confidence score for the edges in PPI network.
entrez_id_header: The column name for the Entrez id in the differential expression file.
log2_fold_change_header: The column name for the log2 fold change in the differential expression file.
adj_p_header: The column name for the adjusted p-value in the differential expression file.
base_mean_header: The column name for the base mean in the differential expression file.
entrez_delimiter: If there is more than one Entrez id per row in the diff. expr. file, the separator betweem them.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for guiltytargets-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4bd769a48588161ffe7dbf5075bded980046c47574bb9e5accd0a757e2cf2485 |
|
MD5 | 0022b04bb83a73297e3e7d8a6a41d485 |
|
BLAKE2b-256 | d78b4aab20b154d8acbe3f5e3568b7dde5db3b19edd9e27c6cda180b4a88c69a |