td-client

Treasure Data API library for Python

These details have been verified by PyPI

Maintainers

chezou takuti treasure_data yyuu

These details have not been verified by PyPI

Project links

Homepage

Project description

Treasure Data API library for Python

Requirements

td-client supports the following versions of Python.

Python 3.5+
PyPy

Install

You can install the releases from PyPI.

$ pip install td-client

It’d be better to install certifi to enable SSL certificate verification.

$ pip install certifi

Examples

Please see also the examples at Treasure Data Documentation.

The td-client documentation is hosted at https://tdclient.readthedocs.io/, or you can go directly to the API documentation.

For information on the parameters that may be used when reading particular types of data, see File import parameters.

Listing jobs

Treasure Data API key will be read from environment variable TD_API_KEY, if none is given via apikey= argument passed to tdclient.Client.

Treasure Data API endpoint https://api.treasuredata.com is used by default. You can override this with environment variable TD_API_SERVER, which in turn can be overridden via endpoint= argument passed to tdclient.Client. List of available Treasure Data sites and corresponding API endpoints can be found here.

import tdclient

with tdclient.Client() as td:
    for job in td.jobs():
        print(job.job_id)

Running jobs

Running jobs on Treasure Data.

import tdclient

with tdclient.Client() as td:
    job = td.query("sample_datasets", "SELECT COUNT(1) FROM www_access", type="hive")
    job.wait()
    for row in job.result():
        print(repr(row))

Running jobs via DBAPI2

td-client-python implements PEP 0249 Python Database API v2.0. You can use td-client-python with external libraries which supports Database API such like pandas.

import pandas
import tdclient

def on_waiting(cursor):
    print(cursor.job_status())

with tdclient.connect(db="sample_datasets", type="presto", wait_callback=on_waiting) as td:
    data = pandas.read_sql("SELECT symbol, COUNT(1) AS c FROM nasdaq GROUP BY symbol", td)
    print(repr(data))

We offer another package for pandas named pytd with some advanced features. You may prefer it if you need to do complicated things, such like exporting result data to Treasure Data, printing job’s progress during long execution, etc.

Importing data

Importing data into Treasure Data in streaming manner, as similar as fluentd is doing.

import sys
import tdclient

with tdclient.Client() as td:
    for file_name in sys.argv[:1]:
        td.import_file("mydb", "mytbl", "csv", file_name)

Bulk import

Importing data into Treasure Data in batch manner.

import sys
import tdclient
import uuid
import warnings

if len(sys.argv) <= 1:
    sys.exit(0)

with tdclient.Client() as td:
    session_name = "session-{}".format(uuid.uuid1())
    bulk_import = td.create_bulk_import(session_name, "mydb", "mytbl")
    try:
        for file_name in sys.argv[1:]:
            part_name = "part-{}".format{file_name}
            bulk_import.upload_file(part_name, "json", file_name)
        bulk_import.freeze()
    except:
        bulk_import.delete()
        raise
    bulk_import.perform(wait=True)
    if 0 < bulk_import.error_records:
        warnings.warn("detected {} error records.".format(bulk_import.error_records))
    if 0 < bulk_import.valid_records:
        print("imported {} records.".format(bulk_import.valid_records))
    else:
        raise(RuntimeError("no records have been imported: {}".format(bulk_import.name)))
    bulk_import.commit(wait=True)
    bulk_import.delete()

If you want to import data as msgpack format, you can write as follows:

import io
import time
import uuid
import warnings

import tdclient

t1 = int(time.time())
l1 = [{"a": 1, "b": 2, "time": t1}, {"a": 3, "b": 9, "time": t1}]

with tdclient.Client() as td:
    session_name = "session-{}".format(uuid.uuid1())
    bulk_import = td.create_bulk_import(session_name, "mydb", "mytbl")
    try:
        _bytes = tdclient.util.create_msgpack(l1)
        bulk_import.upload_file("part", "msgpack", io.BytesIO(_bytes))
        bulk_import.freeze()
    except:
        bulk_import.delete()
        raise
    bulk_import.perform(wait=True)
    # same as the above example

Changing how CSV and TSV columns are read

The td-client package will generally make sensible choices on how to read the columns in CSV and TSV data, but sometimes the user needs to override the default mechanism. This can be done using the optional file import parameters dtypes and converters.

For instance, consider CSV data that starts with the following records:

time,col1,col2,col3
1575454204,a,0001,a;b;c
1575454204,b,0002,d;e;f

If that data is read using the defaults, it will produce values that look like:

1575454204, "a", 1, "a;b;c"
1575454204, "b", 2, "d;e;f"

that is, an integer, a string, an integer and another string.

If the user wants to keep the leading zeroes in col2, then they can specify the column datatype as string. For instance, using bulk_import.upload_file to read data from input_data:

bulk_import.upload_file(
    "part", "msgpack", input_data,
    dtypes={"col2": "str"},
)

which would produce:

1575454204, "a", "0001", "a;b;c"
1575454204, "b", "0002", "d;e;f"

If they also wanted to treat col3 as a sequence of strings, separated by semicolons, then they could specify a function to process col3:

bulk_import.upload_file(
    "part", "msgpack", input_data,
    dtypes={"col2": "str"},
    converters={"col3", lambda x: x.split(";")},
)

which would produce:

1575454204, "a", "0001", ["a", "b", "c"]
1575454204, "b", "0002", ["d", "e", "f"]

Development

Running tests

Run tests.

$ python setup.py test

Running tests (tox)

You can run tests against all supported Python versions. I’d recommend you to install pyenv to manage Pythons.

$ pyenv shell system
$ for version in $(cat .python-version); do [ -d "$(pyenv root)/versions/${version}" ] || pyenv install "${version}"; done
$ pyenv shell --unset

Install tox.

$ pip install tox

Then, run tox.

$ tox

Release

Release to PyPI. Ensure you installed twine.

$ python setup.py bdist_wheel sdist
$ twine upload dist/*

License

Apache Software License, Version 2.0

Project details

These details have been verified by PyPI

Maintainers

chezou takuti treasure_data yyuu

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.3.0

Nov 22, 2024

1.2.2.dev0 pre-release

Nov 22, 2024

1.2.1

Jul 7, 2020

This version

1.2.0

Dec 5, 2019

1.1.0

Oct 16, 2019

1.0.1

Oct 10, 2019

1.0.0

Sep 27, 2019

0.14.0

Jul 11, 2019

0.13.0

Mar 29, 2019

0.12.1.dev0 pre-release

Jun 29, 2018

0.12.0

May 31, 2018

0.11.2.dev0 pre-release

May 31, 2018

0.11.1

May 21, 2018

0.11.0

May 21, 2018

0.10.0

Nov 1, 2017

0.9.0

Feb 27, 2017

0.8.0

Dec 22, 2016

0.8.0.dev0 pre-release

Dec 20, 2016

0.7.0

Dec 6, 2016

0.7.0.dev0 pre-release

Dec 6, 2016

0.6.0

Sep 27, 2016

0.6.0.dev1 pre-release

Sep 26, 2016

0.6.0.dev0 pre-release

Sep 26, 2016

0.5.0

Jun 10, 2016

0.4.2

Mar 15, 2016

0.4.1

Jan 19, 2016

0.4.1.dev0 pre-release

Jan 15, 2016

0.4.0

Jan 14, 2016

0.3.2

Aug 1, 2015

0.3.2.dev0 pre-release

Jul 31, 2015

0.3.1

Jul 10, 2015

0.3.1.dev1 pre-release

Jul 10, 2015

0.3.1.dev0 pre-release

Jul 9, 2015

0.3.0

Jul 3, 2015

0.3.0.dev0 pre-release

Jul 1, 2015

0.2.1

Jun 20, 2015

0.2.0

May 28, 2015

0.2.0.dev3 pre-release

May 28, 2015

0.2.0.dev2 pre-release

May 27, 2015

0.2.0.dev1 pre-release

May 27, 2015

0.2.0.dev0 pre-release

May 22, 2015

0.1.11

May 17, 2015

0.1.10

Mar 30, 2015

0.1.9

Feb 26, 2015

0.1.9.dev0 pre-release

Feb 26, 2015

0.1.8

Feb 26, 2015

0.1.8.dev0 pre-release

Feb 26, 2015

0.1.7

Feb 26, 2015

0.1.7.dev0 pre-release

Feb 26, 2015

0.1.6

Feb 12, 2015

0.1.6.dev0 pre-release

Feb 11, 2015

0.1.5

Feb 10, 2015

0.1.5.dev1 pre-release

Feb 9, 2015

0.1.5.dev0 pre-release

Feb 9, 2015

0.1.4

Feb 6, 2015

0.1.4.dev2 pre-release

Feb 5, 2015

0.1.4.dev1 pre-release

Feb 5, 2015

0.1.4.dev0 pre-release

Feb 2, 2015

0.1.3

Jan 24, 2015

0.1.2

Jan 21, 2015

0.1.1

Jan 21, 2015

0.1.0

Jan 15, 2015

0.1.0.dev6 pre-release

Jan 14, 2015

0.1.0.dev5 pre-release

Jan 11, 2015

0.1.0.dev4 pre-release

Jan 10, 2015

0.1.0.dev3 pre-release

Jan 8, 2015

0.1.0.dev2 pre-release

Jan 5, 2015

0.1.0.dev1 pre-release

Jan 3, 2015

0.1.0.dev0 pre-release

Dec 29, 2014

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

td-client-1.2.0.tar.gz (62.8 kB view details)

Uploaded Dec 5, 2019 Source

Built Distribution

td_client-1.2.0-py3-none-any.whl (86.8 kB view details)

Uploaded Dec 5, 2019 Python 3

File details

Details for the file td-client-1.2.0.tar.gz.

File metadata

Download URL: td-client-1.2.0.tar.gz
Upload date: Dec 5, 2019
Size: 62.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.6

File hashes

Hashes for td-client-1.2.0.tar.gz
Algorithm	Hash digest
SHA256	`82050774fdfe756af943f052acfda92c3425f45144fc791a3dfde2dd6189d605`
MD5	`40eb64c15b8f6e079252778c8461c00f`
BLAKE2b-256	`76b7e37fed46aba5a37692513641d806801c8820d0dbf7700b8277e8e59a957c`

See more details on using hashes here.

File details

Details for the file td_client-1.2.0-py3-none-any.whl.

File metadata

Download URL: td_client-1.2.0-py3-none-any.whl
Upload date: Dec 5, 2019
Size: 86.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.6

File hashes

Hashes for td_client-1.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8f114b35368b76cddd3ea963715e06da8cd00aa64432e703636e9695a86f4ca7`
MD5	`87cc74612598126c06de4654fc53ca3a`
BLAKE2b-256	`f193ca52fb9b64f64b37635d796a1c610f16850238de02352b4c6b4bf419cd97`