Pipeline Profiler tool. Enables the exploration of D3M pipelines in Jupyter Notebooks
Project description
PipelineProfiler
AutoML Pipeline exploration tool compatible with Jupyter Notebooks. Supports auto-sklearn and D3M pipeline format.
(Shift click to select multiple pipelines)
Paper: https://arxiv.org/abs/2005.00160
Video: https://youtu.be/2WSYoaxLLJ8
Blog: Medium post
Demo
Live demo (Google Colab):
In Jupyter Notebook:
import PipelineProfiler
data = PipelineProfiler.get_heartstatlog_data()
PipelineProfiler.plot_pipeline_matrix(data)
Install
Option 1: install via pip:
pip install pipelineprofiler
Option 2: Run the docker image:
docker build -t pipelineprofiler .
docker run -p 9999:8888 pipelineprofiler
Then copy the access token and log in to jupyter in the browser url:
localhost:9999
Data preprocessing
PipelineProfiler reads data from the D3M Metalearning database. You can download this data from: https://metalearning.datadrivendiscovery.org/dumps/2020/03/04/metalearningdb_dump_20200304.tar.gz
You need to merge two files in order to explore the pipelines: pipelines.json and pipeline_runs.json. To do so, run
python -m PipelineProfiler.pipeline_merge [-n NUMBER_PIPELINES] pipeline_runs_file pipelines_file output_file
Pipeline exploration
import PipelineProfiler
import json
In a jupyter notebook, load the output_file
with open("output_file.json", "r") as f:
pipelines = json.load(f)
and then plot it using:
PipelineProfiler.plot_pipeline_matrix(pipelines[:10])
Data postprocessing
You might want to group pipelines by problem type, and select the top k pipelines from each team. To do so, use the code:
def get_top_k_pipelines_team(pipelines, k):
team_pipelines = defaultdict(list)
for pipeline in pipelines:
source = pipeline['pipeline_source']['name']
team_pipelines[source].append(pipeline)
for team in team_pipelines.keys():
team_pipelines[team] = sorted(team_pipelines[team], key=lambda x: x['scores'][0]['normalized'], reverse=True)
team_pipelines[team] = team_pipelines[team][:k]
new_pipelines = []
for team in team_pipelines.keys():
new_pipelines.extend(team_pipelines[team])
return new_pipelines
def sort_pipeline_scores(pipelines):
return sorted(pipelines, key=lambda x: x['scores'][0]['value'], reverse=True)
pipelines_problem = {}
for pipeline in pipelines:
problem_id = pipeline['problem']['id']
if problem_id not in pipelines_problem:
pipelines_problem[problem_id] = []
pipelines_problem[problem_id].append(pipeline)
for problem in pipelines_problem.keys():
pipelines_problem[problem] = sort_pipeline_scores(get_top_k_pipelines_team(pipelines_problem[problem], k=100))
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file pipelineprofiler-0.1.16.tar.gz
.
File metadata
- Download URL: pipelineprofiler-0.1.16.tar.gz
- Upload date:
- Size: 868.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0.post20191030 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0679e0ab18ca86271d39ab2718c1189263dee3248d147ade2568ded9b9e9ef41 |
|
MD5 | 6a679fc281b4d64cf7a47633dfce8f8a |
|
BLAKE2b-256 | b6e40ccc6df9f79ff7af205645df165ae08ac6e120ee2cb1887fdd90bbcf1348 |
File details
Details for the file pipelineprofiler-0.1.16-py3.6.egg
.
File metadata
- Download URL: pipelineprofiler-0.1.16-py3.6.egg
- Upload date:
- Size: 897.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0.post20191030 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 485a45192692b5089147cb82c33f780302d1de31b2b53a99031422bd90df66f9 |
|
MD5 | 64fd53bc7719e17cf17ca24c1f3be052 |
|
BLAKE2b-256 | ea3ce3358c81c14f0bbb7fc37e46fb171afeca710b3af4caa060d74010de9fc6 |
File details
Details for the file pipelineprofiler-0.1.16-py3-none-any.whl
.
File metadata
- Download URL: pipelineprofiler-0.1.16-py3-none-any.whl
- Upload date:
- Size: 879.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0.post20191030 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 84b787d98f155a84fd327ae9f2e8c8d3e229dd0933eba722d983dea100a5111e |
|
MD5 | b14995f4535f6ccc59c23d4cde3565bb |
|
BLAKE2b-256 | 597f949b9185d2876c0dc0e947a71ff0e70088fa3a4d424581e6fd2720b6a956 |