Skip to main content

Treasure Data extension for pyspark

Project description

td_pyspark

Treasure Data extension of pyspark.

Usage

import td_pyspark
from pyspark.sql import SparkSession

spark = SparkSession\
    .builder\
    .appName("td-pyspark-app")\
    .getOrCreate()

td = td_pyspark.TDSparkContext(spark)

# Read the table data within -1d (yesterday) range as DataFrame
df = td.table("sample_datasets.www_access")\
    .within("-1d")\
    .df()
df.show()

# Submit a Presto query
q = td.presto("select 1")
q.show()

For Developers

Running pyspark with td_pyspark:

$ ./bin/spark-submit --master "local[4]"  --driver-class-path td-spark-assemblyd.jar  --properties-file=td-spark.conf --py-files ~/work/git/td-spark/td_pyspark/td_pyspark/td_pyspark.py ~/work/git/td-spark/td_pyspark/td_pyspark/tests/test_pyspark.py

How to publish

Prerequisites

Twine is a secure utility to publish the python package. It's commonly used to publish Python package to PyPI. First you need to install the package in advance.

$ pip install twine

Having the configuration file for PyPI credential may be useful.

$ cat << 'EOF' > ~/.pypirc 
[distutils]
index-servers =
  pypi
  pypitest

[pypi]
repository=https://upload.pypi.org/legacy/
username=<your_username>
password=<your_password>

[pypitest]
repository=https://test.pypi.org/legacy/
username=<your_username>
password=<your_password>
EOF

Build Package

Build the package in the raw source code and wheel format.

$ python setup.py sdist bdist_wheel

Publish Package

Upload the package to the test repository first.

$ twine upload \
  --repository pypitest \
  dist/*

If you do not find anything wrong in the test repository, then it's time to publish the package.

$ twine upload \
  --repository pypi \
  dist/*

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

td_pyspark-19.5.0.tar.gz (3.2 kB view details)

Uploaded Source

Built Distribution

td_pyspark-19.5.0-py3-none-any.whl (3.5 kB view details)

Uploaded Python 3

File details

Details for the file td_pyspark-19.5.0.tar.gz.

File metadata

  • Download URL: td_pyspark-19.5.0.tar.gz
  • Upload date:
  • Size: 3.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.15.1 setuptools/41.0.1 requests-toolbelt/0.8.0 tqdm/4.32.1 CPython/3.6.4

File hashes

Hashes for td_pyspark-19.5.0.tar.gz
Algorithm Hash digest
SHA256 85c7b71746f600b21b62905625ff2808fe1b2dabf1c89fda29a7953c0f79e8b8
MD5 1adec893a40645892f47687901c195f3
BLAKE2b-256 1eec566cec7a476faa73812d592af32c3e72c9a272ae2fc5492bd4717876eeca

See more details on using hashes here.

File details

Details for the file td_pyspark-19.5.0-py3-none-any.whl.

File metadata

  • Download URL: td_pyspark-19.5.0-py3-none-any.whl
  • Upload date:
  • Size: 3.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.15.1 setuptools/41.0.1 requests-toolbelt/0.8.0 tqdm/4.32.1 CPython/3.6.4

File hashes

Hashes for td_pyspark-19.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 71ea79f6a84ae7cc32ba5923694366416254a87789b97e5c63cbe6b49aea0847
MD5 3dada29e9e97c137c5b15a0adfba0d6d
BLAKE2b-256 44f67781921289bbb439c9784acd5af7098f1699b99b6a5a885e8357c2b405a1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page