Skip to main content

Treasure Data API library for Python

Project description

# Treasure Data API library for Python

[![Build Status](https://travis-ci.org/treasure-data/td-client-python.svg)](https://travis-ci.org/treasure-data/td-client-python)
[![Build status](https://ci.appveyor.com/api/projects/status/eol91l1ag50xee9m/branch/master?svg=true)](https://ci.appveyor.com/project/treasure-data/td-client-python/branch/master)
[![Coverage Status](https://coveralls.io/repos/treasure-data/td-client-python/badge.svg)](https://coveralls.io/r/treasure-data/td-client-python)
[![Code Health](https://landscape.io/github/treasure-data/td-client-python/master/landscape.svg?style=flat)](https://landscape.io/github/treasure-data/td-client-python/master)
[![PyPI version](https://badge.fury.io/py/td-client.svg)](http://badge.fury.io/py/td-client)

Treasure Data API library for Python

## Requirements

`td-client` supports the following versions of Python.

* Python 2.7+
* Python 3.3+
* PyPy

## Install

You can install the releases from [PyPI](https://pypi-hypernode.com/).

```sh
$ pip install td-client
```

It'd be better to install [certifi](https://pypi-hypernode.com/pypi/certifi) to enable SSL certificate verification.

```sh
$ pip install certifi
```

## Examples

Please see also the examples at [Treasure Data Documentation](http://docs.treasuredata.com/articles/rest-api-python-client).

### Listing jobs

Treasure Data API key will be read from environment variable `TD_API_KEY`, if none is given via `apikey=` argument passed to `tdclient.Client`.

Treasure Data API endpoint `https://api.treasuredata.com` is used by default. You can override this with environment variable `TD_API_SERVER`, which in turn can be overridden via `endpoint=` argument passed to `tdclient.Client`. List of available Treasure Data sites and corresponding API endpoints can be found [here](https://support.treasuredata.com/hc/en-us/articles/360001474288-Sites-and-Endpoints).


```python
import tdclient

with tdclient.Client() as td:
for job in td.jobs():
print(job.job_id)
```

### Running jobs

Running jobs on Treasure Data.

```python
import tdclient

with tdclient.Client() as td:
job = td.query("sample_datasets", "SELECT COUNT(1) FROM www_access", type="hive")
job.wait()
for row in job.result():
print(repr(row))
```

### Running jobs via DBAPI2

td-client-python implements [PEP 0249](https://www.python.org/dev/peps/pep-0249/) Python Database API v2.0.
You can use td-client-python with external libraries which supports Database API such like [pandas](http://pandas.pydata.org/).

```python
import pandas
import tdclient

def on_waiting(cursor):
print(cursor.job_status())

with tdclient.connect(db="sample_datasets", type="presto", wait_callback=on_waiting) as td:
data = pandas.read_sql("SELECT symbol, COUNT(1) AS c FROM nasdaq GROUP BY symbol", td)
print(repr(data))
```

We offer another package for pandas named [pandas-td](https://github.com/treasure-data/pandas-td) with some advanced features.
You may prefer it if you need to do complicated things, such like exporting result data to Treasure Data, printing job's
progress during long execution, etc.

### Importing data

Importing data into Treasure Data in streaming manner, as similar as [fluentd](http://www.fluentd.org/) is doing.

```python
import sys
import tdclient

with tdclient.Client() as td:
for file_name in sys.argv[:1]:
td.import_file("mydb", "mytbl", "csv", file_name)
```

### Bulk import

Importing data into Treasure Data in batch manner.

```python
from __future__ import print_function
import sys
import tdclient
import time
import warnings

if len(sys.argv) <= 1:
sys.exit(0)

with tdclient.Client() as td:
session_name = "session-%d" % (int(time.time()),)
bulk_import = td.create_bulk_import(session_name, "mydb", "mytbl")
try:
for file_name in sys.argv[1:]:
part_name = "part-%s" % (file_name,)
bulk_import.upload_file(part_name, "json", file_name)
bulk_import.freeze()
except:
bulk_import.delete()
raise
bulk_import.perform(wait=True)
if 0 < bulk_import.error_records:
warnings.warn("detected %d error records." % (bulk_import.error_records,))
if 0 < bulk_import.valid_records:
print("imported %d records." % (bulk_import.valid_records,))
else:
raise(RuntimeError("no records have been imported: %s" % (repr(bulk_import.name),)))
bulk_import.commit(wait=True)
bulk_import.delete()
```

## Development

### Running tests

Run tests.

```sh
$ python setup.py test
```

### Running tests (tox)

You can run tests against all supported Python versions. I'd recommend you to install [pyenv](https://github.com/yyuu/pyenv) to manage Pythons.

```sh
$ pyenv shell system
$ for version in $(cat .python-version); do [ -d "$(pyenv root)/versions/${version}" ] || pyenv install "${version}"; done
$ pyenv shell --unset
```

Install [tox](https://pypi-hypernode.com/pypi/tox).

```sh
$ pip install tox
```

Then, run `tox`.

```sh
$ tox
```

### Release

Release to PyPI.

```sh
$ python setup.py bdist_wheel --universal sdist upload
```

## Version History

See [CHANGELOG.md](CHANGELOG.md).

## License

Apache Software License, Version 2.0

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

td-client-0.13.0.tar.gz (47.6 kB view details)

Uploaded Source

Built Distribution

td_client-0.13.0-py2.py3-none-any.whl (78.4 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file td-client-0.13.0.tar.gz.

File metadata

  • Download URL: td-client-0.13.0.tar.gz
  • Upload date:
  • Size: 47.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Python-urllib/3.7

File hashes

Hashes for td-client-0.13.0.tar.gz
Algorithm Hash digest
SHA256 3fbc4d75f4ba3b82f1b1c5f832ed55b8e5f84fbe6430e62599805bb0fe83062b
MD5 d3ea462a20215263e9af82ad3df9c2f8
BLAKE2b-256 429ea31d5cf04241710034ad4ba27961e6c720c290af2d492877142535d9bda4

See more details on using hashes here.

File details

Details for the file td_client-0.13.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for td_client-0.13.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 5f5bec092da1bfd7f7f76faca2ac7b430860b3969e078021730e29c23720cafa
MD5 98ad316e56d03eb221abd237526bc47d
BLAKE2b-256 7b52e64c98e0a3740ae46e0e0b535c80298a358c0143c670a80a775020489679

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page