Pipeline Profiler tool. Enables the exploration of D3M pipelines in Jupyter Notebooks
Project description
PipelineProfiler
AutoML Pipeline exploration tool compatible with Jupyter Notebooks.
Paper: https://arxiv.org/abs/2005.00160
Demo
To use PipelineProfiler, first install the Python library (use instructions below). Then, run "Demo.ipynb".
Install
Option 1: Build and install via pip:
cd PipelineProfiler
npm install
npm run build
cd ..
pip install .
Option 2: Run the docker image:
docker build -t pipelineprofiler
docker run -p 9999:8888 pipelineprofiler
Then copy the access token and log in to jupyter in the browser url:
localhost:9999
Data preprocessing
PipelineProfiler reads data from the D3M Metalearning database. You can download this data from: https://metalearning.datadrivendiscovery.org/dumps/2020/03/04/metalearningdb_dump_20200304.tar.gz
You need to merge two files in order to explore the pipelines: pipelines.json and pipeline_runs.json. To do so, run
python -m PipelineProfiler.pipeline_merge [-n NUMBER_PIPELINES] pipeline_runs_file pipelines_file output_file
Pipeline exploration
import PipelineProfiler
import json
In a jupyter notebook, load the output_file
with open("output_file.json", "r") as f:
pipelines = json.load(f)
and then plot it using:
PipelineProfiler.plot_pipeline_matrix(pipelines[:10])
Data postprocessing
You might want to group pipelines by problem type, and select the top k pipelines from each team. To do so, use the code:
def get_top_k_pipelines_team(pipelines, k):
team_pipelines = defaultdict(list)
for pipeline in pipelines:
source = pipeline['pipeline_source']['name']
team_pipelines[source].append(pipeline)
for team in team_pipelines.keys():
team_pipelines[team] = sorted(team_pipelines[team], key=lambda x: x['scores'][0]['normalized'], reverse=True)
team_pipelines[team] = team_pipelines[team][:k]
new_pipelines = []
for team in team_pipelines.keys():
new_pipelines.extend(team_pipelines[team])
return new_pipelines
def sort_pipeline_scores(pipelines):
return sorted(pipelines, key=lambda x: x['scores'][0]['value'], reverse=True)
pipelines_problem = {}
for pipeline in pipelines:
problem_id = pipeline['problem']['id']
if problem_id not in pipelines_problem:
pipelines_problem[problem_id] = []
pipelines_problem[problem_id].append(pipeline)
for problem in pipelines_problem.keys():
pipelines_problem[problem] = sort_pipeline_scores(get_top_k_pipelines_team(pipelines_problem[problem], k=100))
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pipelineprofiler-0.1.1.tar.gz
.
File metadata
- Download URL: pipelineprofiler-0.1.1.tar.gz
- Upload date:
- Size: 858.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 63cd3a0cf5e8294088abe5a5f12ac2452584e73ffe4a8ceafb70d5a17bbba0cb |
|
MD5 | 707de4a528cf0e8c70a8d35b77981f07 |
|
BLAKE2b-256 | 20dc94bdf2e9bb09487e3152a5edd8648b332f8a21256d0bc6056df9dfa9e7d6 |
File details
Details for the file pipelineprofiler-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: pipelineprofiler-0.1.1-py3-none-any.whl
- Upload date:
- Size: 869.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a039d8acfcc42f7bceaea0a444581444c496a1a0405d7fb9a1b13c86d89d0083 |
|
MD5 | ecc276699ac26e4ac40333ee7e4f0903 |
|
BLAKE2b-256 | 65b45a6ef3ec22dc996a7667a9d47bc83a32421aebe255c761df3152f9596266 |