A Python module for data fusion built on top of factorized models.
Project description
scikit-fusion is a Python module for data fusion based on recent collective latent factor models.
Dependencies
scikit-fusion is tested to work under Python 3.
The required dependencies to build the software are Numpy >= 1.7, SciPy >= 0.12, PyGraphviz >= 1.3 (needed only for drawing data fusion graphs) and Joblib >= 0.8.4.
Install
This package uses distutils, which is the default way of installing python modules. To install in your home directory, use:
python setup.py install --user
To install for all users on Unix/Linux:
python setup.py build sudo python setup.py install
For development mode use:
python setup.py develop
Usage
Let’s generate three random data matrices describing three different object types:
>>> import numpy as np >>> R12 = np.random.rand(50, 100) >>> R13 = np.random.rand(50, 40) >>> R23 = np.random.rand(100, 40)
Next, we define our data fusion graph:
>>> from skfusion import fusion >>> t1 = fusion.ObjectType('Type 1', 10) >>> t2 = fusion.ObjectType('Type 2', 20) >>> t3 = fusion.ObjectType('Type 3', 30) >>> relations = [fusion.Relation(R12, t1, t2), fusion.Relation(R13, t1, t3), fusion.Relation(R23, t2, t3)] >>> fusion_graph = fusion.FusionGraph() >>> fusion_graph.add_relations_from(relations)
and then collectively infer the latent data model:
>>> fuser = fusion.Dfmf() >>> fuser.fuse(fusion_graph) >>> print(fuser.factor(t1).shape) (50, 10)
Afterwards new data might arrive:
>>> new_R12 = np.random.rand(10, 100) >>> new_R13 = np.random.rand(10, 40)
for which we define the fusion graph:
>>> new_relations = [fusion.Relation(new_R12, t1, t2), fusion.Relation(new_R13, t1, t3)] >>> new_graph = fusion.FusionGraph(new_relations)
and transform new objects to the latent space induced by the fuser:
>>> transformer = fusion.DfmfTransform() >>> transformer.transform(t1, new_graph, fuser) >>> print(transformer.factor(t1).shape) (10, 10)
scikit-fusion is distributed with a few working data fusion scenarios:
>>> from skfusion import datasets >>> dicty = datasets.load_dicty() >>> print(dicty) FusionGraph(Object types: 3, Relations: 3) >>> print(dicty.object_types) {ObjectType(GO term), ObjectType(Experimental condition), ObjectType(Gene)} >>> print(dicty.relations) {Relation(ObjectType(Gene), ObjectType(GO term)), Relation(ObjectType(Gene), ObjectType(Gene)), Relation(ObjectType(Gene), ObjectType(Experimental condition))}
Relevant links
Official source code repo: https://github.com/marinkaz/scikit-fusion
HTML documentation: TBA
Download releases: https://github.com/marinkaz/scikit-fusion/releases
Issue tracker: https://github.com/marinkaz/scikit-fusion/issues
Data fusion by matrix factorization: http://dx.doi.org/10.1109/TPAMI.2014.2343973
Discovering disease-disease associations by fusing systems-level molecular data: http://www.nature.com/srep/2013/131115/srep03202/full/srep03202.html
Matrix factorization-based data fusion for gene function prediction in baker’s yeast and slime mold: http://www.worldscientific.com/doi/pdf/10.1142/9789814583220_0038
Matrix factorization-based data fusion for drug-induced liver injury prediction: http://www.tandfonline.com/doi/abs/10.4161/sysb.29072
Survival regression by data fusion: http://www.tandfonline.com/doi/abs/10.1080/21628130.2015.1016702
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file scikit-fusion-0.2.tar.gz
.
File metadata
- Download URL: scikit-fusion-0.2.tar.gz
- Upload date:
- Size: 6.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 877b25907bd4d485228e7ecac5039bdb2a719c0f098edea28b8cbb16fa487e28 |
|
MD5 | e69e66038f8c9170b7210ae11fd1c6c6 |
|
BLAKE2b-256 | 67b94e6793d2aed1215a51f76531a3f29bfbb05aff58d9f7da5571103fde2a75 |