Enable a Pandas like API on PySpark
Project description
<img align="right" src="docs/img/logo.jpg">
[![buildstatus](https://travis-ci.org/sparklingpandas/sparklingpandas.svg?branch=master)](https://travis-ci.org/sparklingpandas/sparklingpandas)
SparklingPandas
==============
SparklingPandas aims to make it easy to use the distributed computing power
of PySpark to scale your data analysis with Pandas. SparklingPandas builds on
Spark's DataFrame class to give you a polished, pythonic, and Pandas-like API.
Documentation
=========
See [SparklingPandas.com.](http://sparklingpandas.com/)
Videos
=========
An early version of Sparkling Pandas was discussed in [Sparkling Pandas - using
Apache Spark to scale Pandas - Holden Karau and Juliet Hougland](https://www.youtube.com/watch?v=AcyI_V8FeIU)
Requirements
=========
The primary requirement of SparklingPandas is that you have a recent (v1.4
currently) version of Spark installed - <http://spark.apache.org> and Python
2.7.
Using
=========
Make sure you have the SPARK_HOME environment variable set correctly, as
SparklingPandas uses this for including the PySpark libraries
Other than that you can install SparklingPandas with pip and just import it.
State
=========
This is in early development. Feedback is taken seriously and is seriously appreciated.
As you can tell, us SparklingPandas are a pretty serious bunch.
Support
=========
Check out our Google group at https://groups.google.com/forum/#!forum/sparklingpandas
[![buildstatus](https://travis-ci.org/sparklingpandas/sparklingpandas.svg?branch=master)](https://travis-ci.org/sparklingpandas/sparklingpandas)
SparklingPandas
==============
SparklingPandas aims to make it easy to use the distributed computing power
of PySpark to scale your data analysis with Pandas. SparklingPandas builds on
Spark's DataFrame class to give you a polished, pythonic, and Pandas-like API.
Documentation
=========
See [SparklingPandas.com.](http://sparklingpandas.com/)
Videos
=========
An early version of Sparkling Pandas was discussed in [Sparkling Pandas - using
Apache Spark to scale Pandas - Holden Karau and Juliet Hougland](https://www.youtube.com/watch?v=AcyI_V8FeIU)
Requirements
=========
The primary requirement of SparklingPandas is that you have a recent (v1.4
currently) version of Spark installed - <http://spark.apache.org> and Python
2.7.
Using
=========
Make sure you have the SPARK_HOME environment variable set correctly, as
SparklingPandas uses this for including the PySpark libraries
Other than that you can install SparklingPandas with pip and just import it.
State
=========
This is in early development. Feedback is taken seriously and is seriously appreciated.
As you can tell, us SparklingPandas are a pretty serious bunch.
Support
=========
Check out our Google group at https://groups.google.com/forum/#!forum/sparklingpandas
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.