jubakit

Jubatus Toolkit

These details have not been verified by PyPI

Project links

Homepage

Project description

jubakit: Jubatus Toolkit

jubakit is a Python module to access Jubatus features easily. jubakit can be used in conjunction with scikit-learn so that you can use powerful features like cross validation and model evaluation.

Currently jubakit supports Classifier, Anomaly and Weight engines.

Install

pip install jubakit

Requirements

Python 2.6, 2.7, 3.3, 3.4 or 3.5.
Jubatus needs to be installed.
Although not mandatory, installing scikit-learn is required to use some features like K-fold cross validation.

Quick Start

The following example shows how to perform train/classify using CSV dataset.

from jubakit.classifier import Classifier, Schema, Dataset, Config
from jubakit.loader.csv import CSVLoader

# Load a CSV file.
loader = CSVLoader('iris.csv')

# Define types for each column in the CSV file.
schema = Schema({
  'Species': Schema.LABEL,
}, Schema.NUMBER)

# Get the shuffled dataset.
dataset = Dataset(loader, schema).shuffle()

# Run the classifier service (`jubaclassifier` process).
classifier = Classifier.run(Config())

# Train the classifier.
for _ in classifier.train(dataset): pass

# Classify using the trained classifier.
for (idx, label, result) in classifier.classify(dataset):
  print("true label: {0}, estimated label: {1}".format(label, result[0][0]))

Examples by Topics

See the example directory for working examples.

Example	Topics	Requires scikit-learn
classifier_csv.py	Handling CSV file and numeric features
classifier_shogun.py	Handling CSV file and string features
classifier_digits.py	Handling toy dataset (digits)	✓
classifier_libsvm.py	Handling LIBSVM file	✓
classifier_kfold.py	K-fold cross validation and metrics	✓
classifier_parameter.py	Finding best hyper parameter	✓
classifier_bulk.py	Bulk Train-Test Classifier
classifier_twitter.py	Handling Twitter Streams
anomaly_auc.py	Anomaly detection and metrics
weight_shogun.py	Tracing fv_converter behavior using Weight

Concepts

Loader fetches data from various data sources (e.g., CSV file, RDBMS, MQ, Twitter stream, etc.) in key-value format.
Schema defines the data type (string feature, numeric feature, ground truth (label), etc.) for each keys of data loaded by Loader.
Dataset is an abstract representation of a sequence of data that binds Loader and Schema.
Service receives Dataset and make update/analyze RPC call to Jubatus servers.

License

MIT License

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.6.2

Jan 28, 2019

0.6.1

Oct 29, 2018

0.6.0

Aug 27, 2018

0.5.5

Apr 23, 2018

0.5.4

Feb 26, 2018

0.5.3

Dec 18, 2017

0.5.2

Oct 30, 2017

0.5.1

Aug 28, 2017

0.5.0

Apr 24, 2017

0.4.2

Feb 27, 2017

0.4.1

Dec 26, 2016

0.4.0

Oct 31, 2016

0.3.0

Aug 29, 2016

0.2.2

Jul 25, 2016

This version

0.2.1

Jun 27, 2016

0.2.0

May 30, 2016

0.1.0

Apr 25, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jubakit-0.2.1.tar.gz (19.5 kB view details)

Uploaded Jun 27, 2016 Source

File details

Details for the file jubakit-0.2.1.tar.gz.

File metadata

Download URL: jubakit-0.2.1.tar.gz
Upload date: Jun 27, 2016
Size: 19.5 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for jubakit-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`cdf6b29e25e4e1a19acd4e81dd2b516d125a929b32e553ad723eb38bc5ff270a`
MD5	`48830a620c72c309fe6d007a3b74ba90`
BLAKE2b-256	`fc1b6eb673741636ce8129e1f99894a782805e1e50fd84beac3ce395ed877160`