Treasure Data extension for pyspark
Project description
td_pyspark
Treasure Data extension of pyspark.
Usage
import td_pyspark
from pyspark.sql import SparkSession
spark = SparkSession\
.builder\
.appName("td-pyspark-app")\
.getOrCreate()
td = td_pyspark.TDSparkContext(spark)
# Read the table data within -1d (yesterday) range as DataFrame
df = td.table("sample_datasets.www_access")\
.within("-1d")\
.df()
df.show()
# Submit a Presto query
q = td.presto("select 1")
q.show()
For Developers
Running pyspark with td_pyspark:
$ ./bin/spark-submit --master "local[4]" --driver-class-path td-spark-assemblyd.jar --properties-file=td-spark.conf --py-files ~/work/git/td-spark/td_pyspark/td_pyspark/td_pyspark.py ~/work/git/td-spark/td_pyspark/td_pyspark/tests/test_pyspark.py
How to publish
Prerequisites
Twine is a secure utility to publish the python package. It's commonly used to publish Python package to PyPI. First you need to install the package in advance.
$ pip install twine
Having the configuration file for PyPI credential may be useful.
$ cat << 'EOF' > ~/.pypirc
[distutils]
index-servers =
pypi
pypitest
[pypi]
repository=https://upload.pypi.org/legacy/
username=<your_username>
password=<your_password>
[pypitest]
repository=https://test.pypi.org/legacy/
username=<your_username>
password=<your_password>
EOF
Build Package
Build the package in the raw source code and wheel format.
$ python setup.py sdist bdist_wheel
Publish Package
Upload the package to the test repository first.
$ twine upload \
--repository pypitest \
dist/*
If you do not find anything wrong in the test repository, then it's time to publish the package.
$ twine upload \
--repository pypi \
dist/*
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file td_pyspark-19.5.0.tar.gz
.
File metadata
- Download URL: td_pyspark-19.5.0.tar.gz
- Upload date:
- Size: 3.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.15.1 setuptools/41.0.1 requests-toolbelt/0.8.0 tqdm/4.32.1 CPython/3.6.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 85c7b71746f600b21b62905625ff2808fe1b2dabf1c89fda29a7953c0f79e8b8 |
|
MD5 | 1adec893a40645892f47687901c195f3 |
|
BLAKE2b-256 | 1eec566cec7a476faa73812d592af32c3e72c9a272ae2fc5492bd4717876eeca |
File details
Details for the file td_pyspark-19.5.0-py3-none-any.whl
.
File metadata
- Download URL: td_pyspark-19.5.0-py3-none-any.whl
- Upload date:
- Size: 3.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.15.1 setuptools/41.0.1 requests-toolbelt/0.8.0 tqdm/4.32.1 CPython/3.6.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 71ea79f6a84ae7cc32ba5923694366416254a87789b97e5c63cbe6b49aea0847 |
|
MD5 | 3dada29e9e97c137c5b15a0adfba0d6d |
|
BLAKE2b-256 | 44f67781921289bbb439c9784acd5af7098f1699b99b6a5a885e8357c2b405a1 |