Skip to main content

A Python module for data fusion built on top of factorized models.

Project description

Travis

scikit-fusion is a Python module for data fusion based on recent collective latent factor models.

Dependencies

scikit-fusion is tested to work under Python 3.

The required dependencies to build the software are Numpy >= 1.7, SciPy >= 0.12, PyGraphviz >= 1.3 (needed only for drawing data fusion graphs) and Joblib >= 0.8.4.

Install

This package uses distutils, which is the default way of installing python modules. To install in your home directory, use:

python setup.py install --user

To install for all users on Unix/Linux:

python setup.py build
sudo python setup.py install

For development mode use:

python setup.py develop

Usage

Let’s generate three random data matrices describing three different object types:

>>> import numpy as np
>>> R12 = np.random.rand(50, 100)
>>> R13 = np.random.rand(50, 40)
>>> R23 = np.random.rand(100, 40)

Next, we define our data fusion graph:

>>> from skfusion import fusion
>>> t1 = fusion.ObjectType('Type 1', 10)
>>> t2 = fusion.ObjectType('Type 2', 20)
>>> t3 = fusion.ObjectType('Type 3', 30)
>>> relations = [fusion.Relation(R12, t1, t2),
                 fusion.Relation(R13, t1, t3),
                 fusion.Relation(R23, t2, t3)]
>>> fusion_graph = fusion.FusionGraph()
>>> fusion_graph.add_relations_from(relations)

and then collectively infer the latent data model:

>>> fuser = fusion.Dfmf()
>>> fuser.fuse(fusion_graph)
>>> print(fuser.factor(t1).shape)
(50, 10)

Afterwards new data might arrive:

>>> new_R12 = np.random.rand(10, 100)
>>> new_R13 = np.random.rand(10, 40)

for which we define the fusion graph:

>>> new_relations = [fusion.Relation(new_R12, t1, t2),
                     fusion.Relation(new_R13, t1, t3)]
>>> new_graph = fusion.FusionGraph(new_relations)

and transform new objects to the latent space induced by the fuser:

>>> transformer = fusion.DfmfTransform()
>>> transformer.transform(t1, new_graph, fuser)
>>> print(transformer.factor(t1).shape)
(10, 10)

scikit-fusion is distributed with a few working data fusion scenarios:

>>> from skfusion import datasets
>>> dicty = datasets.load_dicty()
>>> print(dicty)
FusionGraph(Object types: 3, Relations: 3)
>>> print(dicty.object_types)
{ObjectType(GO term), ObjectType(Experimental condition), ObjectType(Gene)}
>>> print(dicty.relations)
{Relation(ObjectType(Gene), ObjectType(GO term)),
 Relation(ObjectType(Gene), ObjectType(Gene)),
 Relation(ObjectType(Gene), ObjectType(Experimental condition))}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit-fusion-0.2.1.tar.gz (6.8 MB view details)

Uploaded Source

File details

Details for the file scikit-fusion-0.2.1.tar.gz.

File metadata

File hashes

Hashes for scikit-fusion-0.2.1.tar.gz
Algorithm Hash digest
SHA256 a1cafbc6d76c5beee35cb7614d27adb7dc5b2876fac85389c310185e8cd384d0
MD5 d91236a5fe74cc32c5979b7e2b2646f2
BLAKE2b-256 f263479e402be5a7bea25cfc15581604aed650f920922e10a98ca5a2c352b00d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page