tableschema-elasticsearch

Generate ES Indexes, load and extract data, based on JSON Table Schema descriptors.

These details have not been verified by PyPI

Project links

Homepage

Project description

# tableschema-elasticsearch-py

[![Travis](https://img.shields.io/travis/frictionlessdata/tableschema-elasticsearch-py/master.svg)](https://travis-ci.org/frictionlessdata/tableschema-elasticsearch-py)
[![Coveralls](http://img.shields.io/coveralls/frictionlessdata/tableschema-elasticsearch-py/master.svg)](https://coveralls.io/r/frictionlessdata/tableschema-elasticsearch-py?branch=master)
[![PyPi](https://img.shields.io/pypi/v/tableschema-elasticsearch-py.svg)](https://pypi-hypernode.com/pypi/tableschema-elasticsearch-py)
[![SemVer](https://img.shields.io/badge/versions-SemVer-brightgreen.svg)](http://semver.org/)
[![Gitter](https://img.shields.io/gitter/room/frictionlessdata/chat.svg)](https://gitter.im/frictionlessdata/chat)

Generate and load ElasticSearch indexes based on JSON Table Schema descriptors.

## Getting Started

### Installation

```bash
pip install tableschema-elasticsearch
```

### Storage

Package implements [Tabular Storage](https://github.com/frictionlessdata/jsontableschema-py#storage) interface.

`elasticsearch` is used as the db wrapper. We can get storage this way:

```python
from elasticsearch import Elasticsearch
from jsontableschema_sql import Storage

engine = Elasticsearch()
storage = Storage(engine)
```

Then we could interact with storage ('buckets' are ElasticSearch indexes in this context):

```python
storage.buckets # iterator over bucket names
storage.create('bucket', [(doc_type, descriptor)],
reindex=False, mapping_generator_cls=None)
# Reindex will copy existing documents from an existing index with the same name (not implemented yet)
# mapping_generator_cls allows customization of the generated mapping
storage.delete('bucket')
storage.describe('bucket') # return descriptor, not implemented yet
storage.iter('bucket', doc_type=optional) # yield rows
storage.read('bucket', doc_type=optional) # return rows
storage.write('bucket', doc_type, rows, primary_key,
as_generator=False)
# primary_key is a list of field names which will be used to generate document ids
```

When creating indexes, we always create an index with a semi-random name and a matching alias that points to it. This allows us to decide whether to re-index documents whenever we're re-creating an index, or to discard the existing records.

### Mappings

When creating indexes, the tableschema types are converted to ES types and a mapping is generated for the index.

Some special properties in the schema provide extra information for generating the mapping:
- `array` types need also to have the `es:itemType` property which specifies the inner data type of array items.
- `object` types need also to have the `es:schema` property which provides a tableschema for the inner document contained in that object (or have `es:enabled=false` to disable indexing of that field).

Example:
```json
{
"fields": [
{
"name": "my-number",
"type": "number"
},
{
"name": "my-array-of-dates",
"type": "array",
"es:itemType": "date"
},
{
"name": "my-person-object",
"type": "object",
"es:schema": {
"fields": [
{"name": "name", "type": "string"},
{"name": "surname", "type": "string"},
{"name": "age", "type": "integer"},
{"name": "date-of-birth", "type": "date", "format": "%Y-%m-%d"}
]
}
},
{
"name": "my-library",
"type": "array",
"es:itemType": "object",
"es:schema": {
"fields": [
{"name": "title", "type": "string"},
{"name": "isbn", "type": "string"},
{"name": "num-of-pages", "type": "integer"}
]
}
},
{
"name": "my-user-provded-object",
"type": "object",
"es:enabled": false
}
]
}
```

#### Custom mappings
By providing a custom mapping generator class (via `mapping_generator_cls`), inheriting from the MappingGenerator class you should be able

### Drivers

`elasticsearch-py` is used to access the ElasticSearch interface - [docs](https://elasticsearch-py.readthedocs.io/en/master/).

## API Reference

### Snapshot

https://github.com/frictionlessdata/tableschema-elasticsearch-py#snapshot

### Detailed

- [Changelog](https://github.com/frictionlessdata/tableschema-elasticsearch-py/commits/master)

## Contributing

Please read the contribution guideline:

[How to Contribute](CONTRIBUTING.md)

Thanks!

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

2.1.0

Sep 21, 2021

2.0.0

Sep 20, 2021

1.1.0

Nov 26, 2019

1.0.0

Oct 6, 2019

0.5.3

Aug 15, 2019

0.4.0

Nov 27, 2018

0.3.0

Jul 22, 2018

0.2.0

Apr 2, 2018

0.1.3

Dec 26, 2017

0.1.2

Nov 24, 2017

0.1.1

Sep 10, 2017

0.0.2

Aug 3, 2017

This version

0.0.1

Aug 3, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tableschema-elasticsearch-0.0.1.tar.gz (8.1 kB view hashes)

Uploaded Aug 3, 2017 Source

Hashes for tableschema-elasticsearch-0.0.1.tar.gz

Hashes for tableschema-elasticsearch-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`b6ff1bad0207c36a6b69253d6cb72512701db0ec9d96566c855b9c04956d7eb7`
MD5	`147532078f9bff78a821ab7fd14ef0d8`
BLAKE2b-256	`0d5d06f3aecc11d29fbd2918a7b2635c65804679bad1895a522b1d8bc3f8b869`