Skip to main content

Pipeline Profiler tool. Enables the exploration of D3M pipelines in Jupyter Notebooks

Project description

PipelineProfiler

AutoML Pipeline exploration tool compatible with Jupyter Notebooks.

System screen

Paper: https://arxiv.org/abs/2005.00160

Demo

To use PipelineProfiler, first install the Python library (use instructions below). Then, run "Demo.ipynb".

Install

Option 1: Build and install via pip:

cd PipelineProfiler
npm install
npm run build
cd ..
pip install .

Option 2: Run the docker image:

docker build -t pipelineprofiler
docker run -p 9999:8888 pipelineprofiler

Then copy the access token and log in to jupyter in the browser url:

localhost:9999

Data preprocessing

PipelineProfiler reads data from the D3M Metalearning database. You can download this data from: https://metalearning.datadrivendiscovery.org/dumps/2020/03/04/metalearningdb_dump_20200304.tar.gz

You need to merge two files in order to explore the pipelines: pipelines.json and pipeline_runs.json. To do so, run

python -m PipelineProfiler.pipeline_merge [-n NUMBER_PIPELINES] pipeline_runs_file pipelines_file output_file

Pipeline exploration

import PipelineProfiler
import json

In a jupyter notebook, load the output_file

with open("output_file.json", "r") as f:
    pipelines = json.load(f)

and then plot it using:

PipelineProfiler.plot_pipeline_matrix(pipelines[:10])

Data postprocessing

You might want to group pipelines by problem type, and select the top k pipelines from each team. To do so, use the code:

def get_top_k_pipelines_team(pipelines, k):
    team_pipelines = defaultdict(list)
    for pipeline in pipelines:
        source = pipeline['pipeline_source']['name']
        team_pipelines[source].append(pipeline)
    for team in team_pipelines.keys():
        team_pipelines[team] = sorted(team_pipelines[team], key=lambda x: x['scores'][0]['normalized'], reverse=True)
        team_pipelines[team] = team_pipelines[team][:k]
    new_pipelines = []
    for team in team_pipelines.keys():
        new_pipelines.extend(team_pipelines[team])
    return new_pipelines

def sort_pipeline_scores(pipelines):
    return sorted(pipelines, key=lambda x: x['scores'][0]['value'], reverse=True)    

pipelines_problem = {}
for pipeline in pipelines:  
    problem_id = pipeline['problem']['id']
    if problem_id not in pipelines_problem:
        pipelines_problem[problem_id] = []
    pipelines_problem[problem_id].append(pipeline)
for problem in pipelines_problem.keys():
    pipelines_problem[problem] = sort_pipeline_scores(get_top_k_pipelines_team(pipelines_problem[problem], k=100))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pipelineprofiler-0.1.1.tar.gz (858.3 kB view details)

Uploaded Source

Built Distribution

pipelineprofiler-0.1.1-py3-none-any.whl (869.4 kB view details)

Uploaded Python 3

File details

Details for the file pipelineprofiler-0.1.1.tar.gz.

File metadata

  • Download URL: pipelineprofiler-0.1.1.tar.gz
  • Upload date:
  • Size: 858.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.9

File hashes

Hashes for pipelineprofiler-0.1.1.tar.gz
Algorithm Hash digest
SHA256 63cd3a0cf5e8294088abe5a5f12ac2452584e73ffe4a8ceafb70d5a17bbba0cb
MD5 707de4a528cf0e8c70a8d35b77981f07
BLAKE2b-256 20dc94bdf2e9bb09487e3152a5edd8648b332f8a21256d0bc6056df9dfa9e7d6

See more details on using hashes here.

File details

Details for the file pipelineprofiler-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: pipelineprofiler-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 869.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.9

File hashes

Hashes for pipelineprofiler-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a039d8acfcc42f7bceaea0a444581444c496a1a0405d7fb9a1b13c86d89d0083
MD5 ecc276699ac26e4ac40333ee7e4f0903
BLAKE2b-256 65b45a6ef3ec22dc996a7667a9d47bc83a32421aebe255c761df3152f9596266

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page