Spark Structured Streaming multithread in IPython Notebooks
Project description
Spark multithread in IPython Notebooks.
It’s now simple to execute Spark Structured Streaming in Jupyter Notebooks
Install
pip install nbthread_spark --process-dependency-links
Usage
Show Spark Buttons for stop and UI:
from nbthread_spark.spark import SparkRunner
spark = SparkRunner.builder.getOrCreate() # same as original SparkSession
## you will see buttons ;)
Given a Socket Stream:
TCP_IP = "localhost"
TCP_PORT = 9005
from pyspark.sql.functions import from_json
from pyspark.sql import SparkSession
from pyspark.sql.types import StructField, StructType, IntegerType
schema = StructType([
StructField("bip", IntegerType(), True),
StructField("is_on", IntegerType(), True)
])
spark = SparkSession \
.builder \
.appName("IOTStreamApp") \
.getOrCreate()
iot_stream = spark \
.readStream \
.format("socket") \
.option("host", TCP_IP) \
.option("port", TCP_PORT) \
.load()
iot_expanded = iot_stream.withColumn('value_json',
from_json('value', schema)
).drop('value').select('value_json.*')
query = iot_expanded \
.writeStream \
.outputMode("update") \
.format("memory") \
.queryName("iot_table") \
.start()
You can run queries using this:
from nbthread_spark.stream import StreamRunner
runner = StreamRunner(query)
runner.controls()
## you will see buttons ;)
runner.start() # start without controls
runner.status() # show stream status
runner.stop() # stop streaming and thread
For Stream Manager you can control lot of streams in a easy way:
from nbthread_spark.manager import StreamManager
sm = StreamManager()
sm.append(runner)
sm.append(runner1)
sm.append(runner2)
sm.all_controls()
## you will see all buttons from streams ;)
sm.start_all() # start all streams
sm.stop_all() # stop all streams
Special Thanks
Here the list of students that contribute with this module.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nbthread_spark-0.0.6.tar.gz
(3.0 kB
view details)
File details
Details for the file nbthread_spark-0.0.6.tar.gz
.
File metadata
- Download URL: nbthread_spark-0.0.6.tar.gz
- Upload date:
- Size: 3.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ce72ffb1492317162ebbaaeecc0a2140f111b108a33d9c000109e57a252d4e16 |
|
MD5 | d459619eadea51877c05ab2efd1b9fda |
|
BLAKE2b-256 | f97f14eb10642c82974337987ae0fbb93d1e56d6b160b83605102a1208c6a23a |