Simple Apache Drill alternative using PySpark
Project description
Simple Apache Drill alternative using PySpark inspired by PyDAL
Setup
Run terminal command pip install microdrill
Dependencies
PySpark was tested with Spark 1.6
Usage
Defining Query Parquet Table
ParquetTable(table_name, schema_index_file=file_name)
table_name: Table referenced name.
file_name: File name to search for table schema.
Using Parquet DAL
ParquetDAL(file_uri, sc)
file_uri: It can be the path to files or hdfs:// or any other location
sc: Spark Context (https://spark.apache.org/docs/1.6.0/api/python/pyspark.html#pyspark.SparkContext)
Connecting in tables
Queries
Returning Table Object
parquet_conn(table_name)
Returning Field Object
parquet_conn(table_name)(field_name)
Basic Query
Grouping By
parquet_conn.groupby(field_object1, [field_object2, ...])
Ordering By
Limiting
parquet_conn.limit(number)
Executing
df = parquet_conn.execute() execute() returns a PySpark DataFrame.
Returning Field Names From Schema
parquet_conn(table_name).schema()
Developers
Install latest jdk and run in terminal make setup
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file microdrill-0.0.3.tar.gz
.
File metadata
- Download URL: microdrill-0.0.3.tar.gz
- Upload date:
- Size: 7.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0ffa274b046a602225909f4cdaa7943be64f7dc99217583d37456468d303ca73 |
|
MD5 | 2319b58847b3cf34879b97ab4d88c7dd |
|
BLAKE2b-256 | 4e9412aab5729bafac62bc3f036eaf60ae2a6e7be7dcbdd243817ee463e1d27a |