Skip to main content

AlphaD3M: NYU's AutoML System

Project description

PyPI version

AlphaD3M is an AutoML system that automatically searches for models and derives end-to-end pipelines that read, pre-process the data, and train the model. AlphaD3M leverages recent advances in deep reinforcement learning and is able to adapt to different application domains and problems through incremental learning.

AlphaD3M provides data scientists and data engineers the flexibility to address complex problems by leveraging the Python ecosystem, including open-source libraries and tools, support for collaboration, and infrastructure that enables transparency and reproducibility.

This repository is part of New York University's implementation of the Data Driven Discovery project (D3M).

Support for Many ML Problems

AlphaD3M uses a comprehensive collection of primitives developed under the D3M program as well as primitives provided in open-source libraries, such as scikit-learn, to derive pipelines for a wide range of machine learning tasks. These pipelines can be applied to different data types and derive standard performance metrics.

  • Learning Tasks: classification (semi-supervised, binary, multiclass, and multi-label), regression (univariate, and multivariate), time series (forecasting, hierarchical forecasting, and classification), image-based problems (object detection, remote sensing, and image recognition), graph-based problems (collaborative filtering, community detection, graph matching, link prediction, and vertex classification), multi-instance learning and clustering.

  • Data Types: tabular, time series, hierarchical (grouped, multi-index) time series, geospatial, images, multi-spectral imagery, relational, text, graph, audio, video.

  • Data Formats: D3M, raw CSV, raw text files, OpenML, and scikit-learn datasets.

  • Metrics: accuracy, F1, macro F1, micro F1, mean squared error, mean absolute error, root mean squared error, object detection AP, hamming loss, ROC-AUC, ROC-AUC macro, ROC-AUC micro, jaccard similarity score, normalized mutual information, hit at K, R2, recall, mean reciprocal rank, precision, and precision at top K.

Installation

You can use AlphaD3M through d3m-interface. d3m-interface is a Python library to use D3M AutoML systems. There are two ways to use AlphaD3M: 1) via Docker/Singularity containers (full version), and 2) via PyPI installation (lightweight version).

Docker/Singularity containers (full version)

AlphaD3M and the D3M Core will be deployed as a container. This version works with Python 3.6 through 3.8, and supports all the ML tasks and data types mentioned above. You need to have Docker or Singularity installed on your operating system.

For this version, you just need to install d3m-interface:

$ pip install d3m-interface

The first time d3m-interface is used, it automatically downloads a Docker image containing AlphaD3M and the D3M Core.

PyPI (lightweight version)

Currently, this version has support for classification, regression and forecasting tasks (using a limited set of primitives). It supports tabular, text and image data types. This package works with Python 3.8 in Linux and Mac.

To install, run these commands:

$ pip install d3m-interface
$ pip install alphad3m
$ pip install d3m-common-primitives d3m-sklearn-wrap dsbox-corex dsbox-primitives sri-d3m distil-primitives d3m-esrnn d3m-nbeats

The last command installs the primitives available on PyPI.

Documentation

The documentation of our system can be found here. To help users get started with AlphaD3M, we provide Jupyter Notebooks in our public repository that show examples of how the library can be used. We also have documentation for the API.

Usability, Model Exploration and Explanation

AlphaD3M greatly simplifies the process to create predictive models. Users can interact with the system from a Jupyter notebook, and derive models using a few lines of Python code.

Users can leverage Python-based libraries and tools to clean, transform and visualize data, as well as standard methods to explain machine learning models. They can also be combined to build customized solutions for specific problems that can be deployed to end users.

The AlphaD3M environment includes tools that we developed to enable users to explore the pipelines and their predictions:

  • PipelineProfiler, an interactive visual analytics tool that empowers data scientists to explore the pipelines derived by AlphaD3M within a Jupyter notebook, and gain insights to improve them as well as make an informed decision while selecting models for a given application.

  • Visual Text Explorer, a tool that helps users to understand models for text classification, by allowing to explore model predictions and their association with words and entities present in the classified documents.

How AlphaD3M works?

Inspired by AlphaZero, AlphaD3M frames the problem of pipeline synthesis for model discovery as a single-player game where the player iteratively builds a pipeline by selecting actions (insertion, deletion and replacement of pipeline components). We solve the meta-learning problem using a deep neural network and a Monte Carlo tree search (MCTS). The neural network receives as input an entire pipeline, data meta-features, and the problem, and outputs action probabilities and estimates for the pipeline performance. The MCTS uses the network probabilities to run simulations which terminate at actual pipeline evaluations. To reduce the search space, we define a pipeline grammar where the rules of the grammar constitute the actions. The grammar rules grow linearly with the number of primitives and hence address the issue of scalability. Finally, AlphaD3M performs hyperparameter optimization of the best pipelines using SMAC.

For more information about how AlphaD3M works, see our papers:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alphad3m-0.11.0.tar.gz (220.0 kB view details)

Uploaded Source

Built Distribution

alphad3m-0.11.0-py3-none-any.whl (245.5 kB view details)

Uploaded Python 3

File details

Details for the file alphad3m-0.11.0.tar.gz.

File metadata

  • Download URL: alphad3m-0.11.0.tar.gz
  • Upload date:
  • Size: 220.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.26.0 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.49.0 importlib-metadata/4.11.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.12

File hashes

Hashes for alphad3m-0.11.0.tar.gz
Algorithm Hash digest
SHA256 a8d258513980e169440fe5aace38d37398ae5bab33452a5a34ff7a24d7c41d0b
MD5 3ef3f9e796d49fc3b63a1a2873facffd
BLAKE2b-256 c858ddeb77f6b3c344dcc84108856ee6c9d36f1409479c88a02d32054569ad07

See more details on using hashes here.

File details

Details for the file alphad3m-0.11.0-py3-none-any.whl.

File metadata

  • Download URL: alphad3m-0.11.0-py3-none-any.whl
  • Upload date:
  • Size: 245.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.26.0 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.49.0 importlib-metadata/4.11.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.12

File hashes

Hashes for alphad3m-0.11.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6e8febdaea7e1740de293067279a64ccf7d3854221154d4949f8befa915c5c7e
MD5 0e2227b0e3d75f96b625b9070526428c
BLAKE2b-256 e1475f77ac0c51be26e2daff71a15aba5cfc739393c51c5cc317de8896b41c12

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page