thunder-python

Large-scale neural data analysis in Spark

Project description

Thunder

Large-scale neural data analysis with Spark - project page

About

Thunder is a library for analyzing large-scale spatial and temopral neural data. It’s fast to run, easy to extend, and designed for interactivity. It is built on Spark, a new framework for cluster computing.

Thunder includes utilties for loading and saving different formats, classes for working with distributed spatial and temporal data, and modular functions for time series analysis, factorization, and model fitting. Analyses can easily be scripted or combined. It is written against Spark’s Python API (Pyspark), making use of scipy, numpy, and scikit-learn.

Documentation

This README contains basic information for installation and usage. See the documentation for more details, example usage, and API references. If you have a problem, question, or idea, post to the mailing list. If you find a bug, submit an issue. If posting an issue, please provide information about your environment (e.g. local usage or EC2, operating system) and instructions for reproducing the error.

Quick start

Thunder is designed to run on a cluster, but local testing is a great way to learn and develop. Many computers can install it with just a few simple steps. If you aren’t currently using Python for scientific computing, Anaconda is highly recommended.

Download the latest, “pre-built for Hadoop 1.x” version of Spark, and set one environmental variable

export SPARK_HOME=/your/path/to/spark

Install Thunder

pip install thunder-python

Start Thunder from the terminal

thunder
>> from thunder import ICA
>> data = tsc.makeExample("ica")
>> model = ICA(c=2).fit(data)

To run in iPython, just set this environmental variable before staring:

export IPYTHON=1

To run analyses as standalone jobs, use the submit script

thunder-submit <package/analysis> <datadirectory> <outputdirectory> <opts>

We also include a script for launching an Amazon EC2 cluster with Thunder preinstalled

thunder-ec2 -k mykey -i mykey.pem -s <number-of-nodes> launch <cluster-name>

Analyses

Thunder currently includes two primary data types for distributed spatial and temporal data, and four main analysis packages: classification (decoding), clustering, factorization, and regression. It also provides an entry point for loading and converting a variety of raw data formats, and utilities for exporting or inspecting results. Scripts can be used to run standalone analyses, but the underlying classes and functions can be used from within the PySpark shell for easy interactive analysis.

Input and output

The primary data types in Thunder – Images and Series – can each be loaded from a variety of raw input formats, including text or flat binary files (for Series) and tif or pngs (for Images). Files can be stored locally, on a networked file system, on Amazon’s S3, or in HDFS. Where needed, metadata (e.g. model parameters) can be provided as numpy arrays or loaded from MAT files. Results can be visualized directly from the python shell or in iPython notebook, or saved to external formats.

Contributions

If you have other ideas or want to contribute, submit an issue or pull request, or reach out to us on the mailing list.

Project details

Release history Release notifications | RSS feed

1.4.2

Aug 5, 2016

1.4.1

Aug 5, 2016

1.4.0

Aug 5, 2016

1.3.0

Aug 3, 2016

1.2.0

Jun 17, 2016

1.1.1

Jun 15, 2016

1.1.0

May 27, 2016

1.0.0

Apr 8, 2016

0.6.0

Jan 8, 2016

0.5.1

Jul 1, 2015

0.5.0

Apr 2, 2015

This version

0.4.1

Nov 4, 2014

0.4.0

Oct 16, 2014

0.3.2

Sep 11, 2014

0.3.1

Sep 4, 2014

0.3.0

Aug 23, 2014

0.2.0

Jul 27, 2014

0.1.0

Jul 19, 2014

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thunder-python-0.4.1.tar.gz (1.2 MB view details)

Uploaded Nov 4, 2014 Source

File details

Details for the file thunder-python-0.4.1.tar.gz.

File metadata

Download URL: thunder-python-0.4.1.tar.gz
Upload date: Nov 4, 2014
Size: 1.2 MB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for thunder-python-0.4.1.tar.gz
Algorithm	Hash digest
SHA256	`847f3b83865e6ac7db8aa22af9aead571e0d44e9273b3f5e8b266fa3049b522e`
MD5	`f9855971ef6ed056d87b9c23f17cdadd`
BLAKE2b-256	`6387c3ae94f4bf5f9c251df53530a0f996f8a7fd27b3817d9f66f22429f601c3`