Python client for HBase Stargate REST server
Project description
HBase Stargate (REST API) client wrapper for Python.
Read the official documentation of Stargate (http://wiki.apache.org/hadoop/Hbase/Stargate).
Description
starbase is (at the moment) a client implementation of the Apache HBase REST API (Stargate).
What you have to know
Beware, that REST API is slow (not to blame on this library!). If you can operate with HBase directly better do so.
Prerequisites
You need to have Hadoop, HBase, Thrift and Stargate running. If you want to make it easy for yourself, read my instructions on installing Cloudera manager (free) on Ubuntu 12.04 LTS here (http://barseghyanartur.blogspot.nl/2013/08/installing-cloudera-on-ubuntu-1204.html) or (https://bitbucket.org/barseghyanartur/simple-cloudera-install).
Once you have everything installed and running (by default Stargate runs on 127.0.0.1:8000), you should be able to run src/starbase/client/test.py without problems (UnitTest).
Features
Project is still in development, thus not all the features of the API are available.
Features implemented
Connect to Stargate.
Show software version.
Show cluster version.
Show cluster status.
List tables.
Retrieve table schema.
Retrieve table meta data.
Get a list of tables’ column families.
Create a table.
Delete a table.
Alter table schema.
Insert (PUT) data into a single row (single or multiple columns).
Update (POST) data of a single row (single or multiple columns).
Select (GET) a single row from table, optionally with selected columns only.
Delete (DELETE) a single row by id.
Batch insert (PUT).
Batch update (POST).
Basic HTTP auth is working. You could provide a login and a password to the connection.
Retrive all rows in a table (table scanning).
Features in-development
Table scanning.
Syntax globbing.
Installation
Install latest stable version from PyPi
$ pip install starbase
Usage examples
A lot of useful examples with comments could be found in stargate.client.tests module. Some most common operations are shown below.
Required imports
>>> from starbase import Connection
Create a connection instance
Defaults to 127.0.0.1:8000. Specify when creating a connection instance if your settings are different.
>>> c = Connection()
Show tables
Assuming that we have two tables named table1 and table2, we’ll see the following.
>>> c.tables() ['table1', 'table2']
Create a new table
Create a table instance (note, that at this step no table is created). If you need to operate with table data, you need to create a table instance.
>>> t = c.table('table3')
Create a table with columns column1, column2, column3 (this is the point where the table is actually created).
>>> t.create('column1', 'column2', 'column3') 201
Show table columns
>>> t.columns() ['column1', 'column2', 'column3']
Insert data into a single row
>>> t.insert( >>> 'my-key-1', >>> { >>> 'column1': {'key11': 'value 11', 'key12': 'value 12', 'key13': 'value 13'}, >>> 'column2': {'key21': 'value 21', 'key22': 'value 22'}, >>> 'column3': {'key32': 'value 31', 'key32': 'value 32'} >>> } >>> ) 200
Note, that you may also use the native way of naming the columns and cells (qualifiers).
>>> t.insert( >>> 'my-key-1a', >>> { >>> 'column1:key11': 'value 11', 'column1:key12': 'value 12', 'column1:key13': 'value 13', >>> 'column2:key21': 'value 21', 'column2:key22': 'value 22', >>> 'column3:key32': 'value 31', 'column3:key32': 'value 32' >>> } >>> ) 200
Fetch a single row with all columns
>>> t.fetch('my-key-1') { 'column1': {'key11': 'value 11', 'key12': 'value 12', 'key13': 'value 13'}, 'column2': {'key21': 'value 21', 'key22': 'value 22'}, 'column3': {'key32': 'value 31', 'key32': 'value 32'} }
Fetch a single row with selected columns
>>> t.fetch('my-key-1', ['column1', 'column2']) { 'column1': {'key11': 'value 11', 'key12': 'value 12', 'key13': 'value 13'}, 'column2': {'key21': 'value 21', 'key22': 'value 22'}, }
Narrow the result set even more
>>> t.fetch('my-key-1', {'column1': ['key11', 'key13'], 'column3': ['key32']}) { 'column1': {'key11': 'value 11', 'key13': 'value 13'}, 'column3': {'key32': 'value 32'} }
Note, that you may also use the native way of naming the columns and cells (qualifiers).
>>> t.fetch('my-key-1', ['column1:key11', 'column1:key13', 'column3:key32']) { 'column1': {'key11': 'value 11', 'key13': 'value 13'}, 'column3': {'key32': 'value 32'} }
If you set the perfect_dict argument to False, you’ll get the native data structure.
>>> t.fetch('my-key-1', ['column1:key11', 'column1:key13', 'column3:key32'], perfect_dict=False) { 'column1:key11': 'value 11', 'column1:key13': 'value 13', 'column3:key32': 'value 32' }
Add columns to the table
Add columns given (column4, column5).
>>> t.add_columns('column4', 'column5') 200
Update row data
>>> t.update( >>> 'my-key-1', >>> {'column4': {'key41': 'value 41', 'key42': 'value 42'}} >>> ) 200
Remove row, row column or row cell
Remove row cell (qualifier)
>>> t.remove('my-key-1', 'column4', 'key41') 200
Remove row column (column family)
>>> t.remove('my-key-1', 'column4') 200
Remove entire row
>>> t.remove('my-key-1') 200
Drop columns from table
Drop columns given (column4, column5).
>>> t.drop_columns('column4', 'column5') 201
Note, that if your columns contain data, even when dropped, the data is not immediately gone. If you first drop the column and the created it again, you will still have all your data originally stored in the column.
Batch insert
>>> data = { >>> 'column1': {'key11': 'value 11', 'key12': 'value 12', 'key13': 'value 13'}, >>> 'column2': {'key21': 'value 21', 'key22': 'value 22'}, >>> } >>> b = t.batch() >>> for i in range(0, 5000): >>> b.insert('my-key-%s' % i, data) >>> b.commit(finalize=True) {'method': 'PUT', 'response': [200], 'url': 'table3/bXkta2V5LTA='}
Batch update
>>> data = { >>> 'column3': {'key31': 'value 31', 'key32': 'value 32'}, >>> } >>> b = t.batch() >>> for i in range(0, 5000): >>> b.update('my-key-%s' % i, data) >>> b.commit(finalize=True) {'method': 'POST', 'response': [200], 'url': 'table3/bXkta2V5LTA='}
Fetch all rows
Table scanning is in development. At the moment it’s only possible to fetch all rows from a table given. Result set returned is a generator.
>>> t.fetch_all_rows() <generator object results at 0x28e9190>
Drop entire table
>>> t.drop() 200
More examples
Show software version
>>> print connection.version {u'JVM': u'Sun Microsystems Inc. 1.6.0_43-20.14-b01', u'Jersey': u'1.8', u'OS': u'Linux 3.5.0-30-generic amd64', u'REST': u'0.0.2', u'Server': u'jetty/6.1.26'}
Show cluster version
>>> print connection.cluster_version u'0.94.7'
Show cluster status
>>> print connection.cluster_status {u'DeadNodes': [], u'LiveNodes': [{u'Region': [{u'currentCompactedKVs': 0, ... u'regions': 3, u'requests': 0}
Show table schema
>>> print table.schema() {u'ColumnSchema': [{u'BLOCKCACHE': u'true', u'BLOCKSIZE': u'65536', ... u'IS_ROOT': u'false', u'name': u'messages'}
Print table metadata
>>> print table.regions()
License
GPL 2.0/LGPL 2.1
Support
For any issues contact me at the e-mail given in the Author section.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.