Pipeline Profiler tool. Enables the exploration of D3M pipelines in Jupyter Notebooks

These details have not been verified by PyPI

Project links

Homepage

Project description

PipelineProfiler

AutoML Pipeline exploration tool compatible with Jupyter Notebooks.

System screen

Paper: https://arxiv.org/abs/2005.00160

Demo

To use PipelineProfiler, first install the Python library (use instructions below). Then, run in Jupyter Notebook:

import PipelineProfiler
data = PipelineProfiler.get_heartstatlog_data()
PipelineProfiler.plot_pipeline_matrix(data)

Install

Option 1: install via pip:

pip install pipelineprofiler

Option 2: Run the docker image:

docker build -t pipelineprofiler .
docker run -p 9999:8888 pipelineprofiler

Then copy the access token and log in to jupyter in the browser url:

localhost:9999

Data preprocessing

PipelineProfiler reads data from the D3M Metalearning database. You can download this data from: https://metalearning.datadrivendiscovery.org/dumps/2020/03/04/metalearningdb_dump_20200304.tar.gz

You need to merge two files in order to explore the pipelines: pipelines.json and pipeline_runs.json. To do so, run

python -m PipelineProfiler.pipeline_merge [-n NUMBER_PIPELINES] pipeline_runs_file pipelines_file output_file

Pipeline exploration

import PipelineProfiler
import json

In a jupyter notebook, load the output_file

with open("output_file.json", "r") as f:
    pipelines = json.load(f)

and then plot it using:

PipelineProfiler.plot_pipeline_matrix(pipelines[:10])

Data postprocessing

You might want to group pipelines by problem type, and select the top k pipelines from each team. To do so, use the code:

def get_top_k_pipelines_team(pipelines, k):
    team_pipelines = defaultdict(list)
    for pipeline in pipelines:
        source = pipeline['pipeline_source']['name']
        team_pipelines[source].append(pipeline)
    for team in team_pipelines.keys():
        team_pipelines[team] = sorted(team_pipelines[team], key=lambda x: x['scores'][0]['normalized'], reverse=True)
        team_pipelines[team] = team_pipelines[team][:k]
    new_pipelines = []
    for team in team_pipelines.keys():
        new_pipelines.extend(team_pipelines[team])
    return new_pipelines

def sort_pipeline_scores(pipelines):
    return sorted(pipelines, key=lambda x: x['scores'][0]['value'], reverse=True)    

pipelines_problem = {}
for pipeline in pipelines:  
    problem_id = pipeline['problem']['id']
    if problem_id not in pipelines_problem:
        pipelines_problem[problem_id] = []
    pipelines_problem[problem_id].append(pipeline)
for problem in pipelines_problem.keys():
    pipelines_problem[problem] = sort_pipeline_scores(get_top_k_pipelines_team(pipelines_problem[problem], k=100))

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.1.18

May 4, 2022

0.1.17

Feb 15, 2021

0.1.16

Jan 30, 2021

0.1.15

Jul 10, 2020

0.1.14

Jul 8, 2020

0.1.13

Jul 1, 2020

0.1.12

Jun 2, 2020

0.1.11

May 27, 2020

0.1.10

May 20, 2020

0.1.9

May 19, 2020

0.1.8

May 19, 2020

This version

0.1.7

May 19, 2020

0.1.6

May 19, 2020

0.1.5

May 18, 2020

0.1.4

May 14, 2020

0.1.3

May 12, 2020

0.1.2

May 4, 2020

0.1.1

May 4, 2020

0.1.0

May 4, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pipelineprofiler-0.1.7.tar.gz (866.9 kB view details)

Uploaded May 19, 2020 Source

Built Distribution

pipelineprofiler-0.1.7-py3-none-any.whl (878.2 kB view details)

Uploaded May 19, 2020 Python 3

File details

Details for the file pipelineprofiler-0.1.7.tar.gz.

File metadata

Download URL: pipelineprofiler-0.1.7.tar.gz
Upload date: May 19, 2020
Size: 866.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0.post20191030 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.9

File hashes

Hashes for pipelineprofiler-0.1.7.tar.gz
Algorithm	Hash digest
SHA256	`ad1416703ed5bd9895f9b5bb45711cbfb795a480603e24165a6907d640ae1b2c`
MD5	`c9b03bfc70357bb1910c9e663cfcf2c9`
BLAKE2b-256	`bf6d1cd0b42e8bf341dd4f40f254bcd0393632213b88c1d929e3afa6f0d65fe6`

See more details on using hashes here.

File details

Details for the file pipelineprofiler-0.1.7-py3-none-any.whl.

File metadata

Download URL: pipelineprofiler-0.1.7-py3-none-any.whl
Upload date: May 19, 2020
Size: 878.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0.post20191030 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.9

File hashes

Hashes for pipelineprofiler-0.1.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5b8a820ed73714fbc30a3dcc09d0f566918f21890ea7e8676127264025dbdda1`
MD5	`eb34d4b88fa9d7a31f8f3511f822e5fc`
BLAKE2b-256	`a3403714480848cd73216591f417b08fdb7be22c7f0c7b94f5a1f6220026e14e`