Skip to main content

Jubatus Toolkit

Project description

Travis Coveralls PyPi

jubakit: Jubatus Toolkit

jubakit is a Python module to access Jubatus features easily. jubakit can be used in conjunction with scikit-learn so that you can use powerful features like cross validation and model evaluation. See the Jubakit Documentation for the detailed description.

Currently jubakit supports Classifier, Regression, Anomaly, Recommender, NearestNeighbor, Clustering and Weight engines.

Install

pip install jubakit

Requirements

  • Python 2.7, 3.3, 3.4 or 3.5.

  • Jubatus needs to be installed.

  • Although not mandatory, installing scikit-learn is required to use some features like K-fold cross validation.

Quick Start

The following example shows how to perform train/classify using CSV dataset.

from jubakit.classifier import Classifier, Schema, Dataset, Config
from jubakit.loader.csv import CSVLoader

# Load a CSV file.
loader = CSVLoader('iris.csv')

# Define types for each column in the CSV file.
schema = Schema({
  'Species': Schema.LABEL,
}, Schema.NUMBER)

# Get the shuffled dataset.
dataset = Dataset(loader, schema).shuffle()

# Run the classifier service (`jubaclassifier` process).
classifier = Classifier.run(Config())

# Train the classifier.
for _ in classifier.train(dataset): pass

# Classify using the trained classifier.
for (idx, label, result) in classifier.classify(dataset):
  print("true label: {0}, estimated label: {1}".format(label, result[0][0]))

Examples by Topics

See the example directory for working examples.

Example

Topics

Requires scikit-learn

classifier_csv.py

Handling CSV file and numeric features

classifier_shogun.py

Handling CSV file and string features

classifier_digits.py

Handling toy dataset (digits)

classifier_libsvm.py

Handling LIBSVM file

classifier_kfold.py

K-fold cross validation and metrics

classifier_parameter.py

Finding best hyper parameter

classifier_hyperopt_tuning.py

Finding best hyper parameter using hyperopt

classifier_bulk.py

Bulk Train-Test Classifier

classifier_twitter.py

Handling Twitter Streams

classifier_model_extract.py

Extract contents of Classfier model file

classifier_sklearn_wrapper.py

Classification using scikit-learn wrapper

classifier_sklearn_grid_search.py

Grid Search example using scikit-learn wrapper

regression_boston.py

Regression with toy dataset (boston)

regression_csv.py

Regression with CSV file

regression_sklearn_wrapper.py

Regression using scikit-learn wrapper

anomaly_auc.py

Anomaly detection and metrics

recommender_npb.py

Recommend similar items

nearest_neighbor_aaai.py

Search neighbor items

clustering_2d.py

Clustering 2-dimensional dataset

weight_shogun.py

Tracing fv_converter behavior using Weight

weight_model_extract.py

Extract contents of Weight model file

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jubakit-0.6.0.tar.gz (55.7 kB view details)

Uploaded Source

File details

Details for the file jubakit-0.6.0.tar.gz.

File metadata

  • Download URL: jubakit-0.6.0.tar.gz
  • Upload date:
  • Size: 55.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Python-urllib/3.6

File hashes

Hashes for jubakit-0.6.0.tar.gz
Algorithm Hash digest
SHA256 0ca6e440806913a67dcd6c9e2f47f7fe580e83fc52348244910361513480f625
MD5 e6d0519485e17698d046a441f880201d
BLAKE2b-256 7ce388b7af32ce3a8a07dc45ce71729844a25acf82e8c3d7d291ece8225cc9d4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page