pytd · PyPI

Treasure Data Driver for Python

These details have been verified by PyPI

Maintainers

chezou takuti treasure_data yyuu

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 5 - Production/Stable
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Programming Language
Topic
- Database

Project description

pytd provides user-friendly interfaces to Treasure Data’s REST APIs, Presto query engine, and Plazma primary storage.

The seamless connection allows your Python code to efficiently read/write a large volume of data from/to Treasure Data. Eventually, pytd makes your day-to-day data analytics work more productive.

Installation

pip install pytd

Usage

Set your API key and endpoint to the environment variables, TD_API_KEY and TD_API_SERVER, respectively, and create a client instance:

import pytd

client = pytd.Client(database='sample_datasets')
# or, hard-code your API key, endpoint, and/or query engine:
# >>> pytd.Client(apikey='1/XXX', endpoint='https://api.treasuredata.com/', database='sample_datasets', default_engine='presto')

Query in Treasure Data

Issue Presto query and retrieve the result:

client.query('select symbol, count(1) as cnt from nasdaq group by 1 order by 1')
# {'columns': ['symbol', 'cnt'], 'data': [['AAIT', 590], ['AAL', 82], ['AAME', 9252], ..., ['ZUMZ', 2364]]}

In case of Hive:

client.query('select hivemall_version()', engine='hive')
# {'columns': ['_c0'], 'data': [['0.6.0-SNAPSHOT-201901-r01']]} (as of Feb, 2019)

It is also possible to explicitly initialize pytd.Client for Hive:

client_hive = pytd.Client(database='sample_datasets', default_engine='hive')
client_hive.query('select hivemall_version()')

Write data to Treasure Data

Data represented as pandas.DataFrame can be written to Treasure Data as follows:

import pandas as pd

df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 10]})
client.load_table_from_dataframe(df, 'takuti.foo', writer='bulk_import', if_exists='overwrite')

For the writer option, pytd supports three different ways to ingest data to Treasure Data:

Bulk Import API: bulk_import (default)
- Convert data into a CSV file and upload in the batch fashion.
Presto INSERT INTO query: insert_into
- Insert every single row in DataFrame by issuing an INSERT INTO query through the Presto query engine.
- Recommended only for a small volume of data.
td-spark: spark
- Local customized Spark instance directly writes DataFrame to Treasure Data’s primary storage system.

Characteristics of each of these methods can be summarized as follows:

	bulk_import	insert_into	spark
Scalable against data volume	✓		✓
Write performance for larger data			✓
Memory efficient	✓	✓
Disk efficient		✓
Minimal package dependency	✓	✓

Enabling Spark Writer

Since td-spark gives special access to the main storage system via PySpark, follow the instructions below:

Contact support@treasuredata.com to activate the permission to your Treasure Data account.
Install pytd with [spark] option if you use the third option: pip install pytd[spark]

If you want to use existing td-spark JAR file, creating SparkWriter with td_spark_path option would be helpful.

from pytd.writer import SparkWriter

writer = SparkWriter(apikey='1/XXX', endpoint='https://api.treasuredata.com/', td_spark_path='/path/to/td-spark-assembly.jar')
client.load_table_from_dataframe(df, 'mydb.bar', writer=writer, if_exists='overwrite')

How to replace pandas-td

pytd offers pandas-td-compatible functions that provide the same functionalities more efficiently. If you are still using pandas-td, we recommend you to switch to pytd as follows.

First, install the package from PyPI:

pip install pytd
# or, `pip install pytd[spark]` if you wish to use `to_td`

Next, make the following modifications on the import statements.

Before:

import pandas_td as td

In [1]: %%load_ext pandas_td.ipython

After:

import pytd.pandas_td as td

In [1]: %%load_ext pytd.pandas_td.ipython

Consequently, all pandas_td code should keep running correctly with pytd. Report an issue from here if you noticed any incompatible behaviors.

Project details

These details have been verified by PyPI

Maintainers

chezou takuti treasure_data yyuu

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 5 - Production/Stable
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Programming Language
Topic
- Database

Release history Release notifications | RSS feed

1.7.0

Sep 20, 2024

1.6.0

Aug 27, 2024

1.5.2

Jun 5, 2024

1.5.1

Dec 8, 2022

1.4.4

Sep 20, 2022

1.4.3

Feb 10, 2021

1.4.2

Feb 10, 2021

1.4.0

Jan 12, 2021

1.3.0

May 11, 2020

1.2.0

Mar 18, 2020

This version

1.1.0

Mar 7, 2020

1.0.0

Nov 11, 2019

0.8.0

Sep 17, 2019

0.7.0

Aug 22, 2019

0.6.2

Jul 30, 2019

0.6.1

Jul 26, 2019

0.6.0

Jul 23, 2019

0.5.0

Jul 3, 2019

0.4.0

Jun 26, 2019

0.3.0

Apr 23, 2019

0.2.5

May 5, 2016

0.2.4

Apr 4, 2016

0.2.3

Nov 23, 2015

0.2.2

Nov 19, 2015

0.2.1

Nov 8, 2015

0.2.0

Nov 3, 2015

0.1.5

Nov 3, 2015

0.1.4

Nov 1, 2015

0.1.3

Oct 29, 2015

0.1.2

Oct 25, 2015

0.1.1

Oct 24, 2015

0.1.0

Oct 24, 2015

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytd-1.1.0.tar.gz (30.1 kB view details)

Uploaded Mar 7, 2020 Source

Built Distribution

pytd-1.1.0-py3-none-any.whl (31.9 kB view details)

Uploaded Mar 7, 2020 Python 3

File details

Details for the file pytd-1.1.0.tar.gz.

File metadata

Download URL: pytd-1.1.0.tar.gz
Upload date: Mar 7, 2020
Size: 30.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.23.0 setuptools/40.2.0 requests-toolbelt/0.9.1 tqdm/4.26.0 CPython/3.7.0

File hashes

Hashes for pytd-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`1859a1373ee63857b7b0efdf91f5ebc0d4c88853c8334cc52b1f8df2dac03eef`
MD5	`368532f26b8d93cbae70688afb28d9d9`
BLAKE2b-256	`dd70985ef2c198876a5160a7946d126f4a71d2a328fdf65748533e734b0dd7b1`

See more details on using hashes here.

File details

Details for the file pytd-1.1.0-py3-none-any.whl.

File metadata

Download URL: pytd-1.1.0-py3-none-any.whl
Upload date: Mar 7, 2020
Size: 31.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.23.0 setuptools/40.2.0 requests-toolbelt/0.9.1 tqdm/4.26.0 CPython/3.7.0

File hashes

Hashes for pytd-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`167bba1f78617723f2a5104e5c79741b52936a67b996710c8bc78dc104002927`
MD5	`11492ec714285a88cf2f5feeaa8e7bdb`
BLAKE2b-256	`e5d0c5382d3d98421a8c72a1fea26c00da373f33bc5f5a5ba9cd18731a31004d`