A CLI for managing hadoop clusters for testing
Project description
A dockerized setup for testing code on a hadoop cluster.
Installation
hadoop-test-cluster is available on PyPI:
$ pip install hadoop-test-cluster
You can also install from source on github:
$ pip install git+https://github.com/jcrist/hadoop-test-cluster.git
Overview
For testing purposes, infrastructure for setting up a mini hadoop cluster using docker is provided here. Two base images are provided:
cdh5: provides a CDH5 installation of Hadoop 2.6
cdh6: provides a CDH6 installation of Hadoop 3.0
Both images can be run with 2 different configurations:
simple: uses simple authentication (unix user permissions)
kerberos uses kerberos for authentication
Each cluster has three containers:
One master node running the hdfs-namenode and yarn-resourcemanager, as well as the kerberos daemons.
One worker node running the hdfs-datanode and yarn-nodemanager
One edge node for interacting with the cluster
One user account has also been created for testing purposes:
Login: testuser
Password: testpass
For the kerberos setup, a keytab for this user has been put at /home/testuser/testuser.keytab, so you can kinit easily like kinit -kt /home/testuser/testuser.keytab testuser
An admin kerberos principal has also been created for use with kadmin:
Login: admin/admin
Password: adminpass
Ports
The full address is dependent on the IP address of your docker-machine driver, which can be found at:
$ docker-machine inspect --format {{.Driver.IPAddress}})
NameNode RPC: 9000
NameNode Webui: 50070
ResourceManager Webui: 8088
Kerberos KDC: 88
Kerberos Kadmin: 749
DataNode Webui: 50075
NodeManager Webui: 8042
The htcluster commandline tool
To work with either cluster, please use the htcluster tool. This is a thin wrapper around docker-compose, with utilities for quickly doing most common actions.
$ htcluster --help
usage: htcluster [--help] [--version] command ...
Manage hadoop test clusters
positional arguments:
command
startup Start up a hadoop cluster.
login Login to a node in the cluster.
exec Execute a command on the node as a user
shutdown Shutdown the cluster and remove the containers.
compose Forward commands to docker-compose
kerbenv Output environment variables to setup kerberos locally. Intended
use is to eval the output in bash: eval $(htcluster kerbenv)
optional arguments:
--help, -h Show this help message then exit
--version Show version then exit
Starting a cluster
Start a CDH5 cluster with simple authentication:
$ htcluster startup --image cdh5 --config simple
Start a CDH6 cluster with kerberos authentication
$ htcluster startup --image cdh6 --config kerberos
Starting a cluster, mounting the current directory to ~/workdir
$ htcluster startup --image cdh5 --mount .:workdir
Login to the edge node
$ htcluster login
Run a commmand as the user on the edge node
$ htcluster exec -- myscript.sh some other args
Shutdown the cluster
$ htcluster shutdown
Authenticating with Kerberos from outside Docker
In the kerberized cluster, the webui’s are secured by kerberos, and so won’t be accessible from your browser unless you configure kerberos properly. This is doable, but takes a few steps:
Kerberos/SPNEGO requires that the requested url matches the hosts domain. The easiest way to do this is to modify your /etc/hosts and add a line for master.example.com:
# Add a line to /etc/hosts pointing master.example.com to your docker-machine # driver ip address. # Note that you probably need to run this command as a super user. $ echo "$(docker-machine inspect --format {{.Driver.IPAddress}}) master.example.com" >> /etc/hosts
You must have kinit installed locally. You may already have it, otherwise it’s available through most package managers.
You need to tell kerberos where the krb5.conf is for this domain. This is done with an environment variable. To make this easy, htcluster has a command to do this:
$ eval $(htcluster kerbenv)
At this point you should be able to kinit as testuser:
$ kinit testuser@EXAMPLE.COM
To access kerberos secured pages in your browser you’ll need to do a bit of (simple) configuration. See [this documentation from Cloudera](https://www.cloudera.com/documentation/enterprise/5-9-x/topics/cdh_sg_browser_access_kerberos_protected_url.html) for information on what’s needed for your browser.
Since environment variables are only available for processes started in the environment, you have three options here:
Restart your browser from the shell in which you added the environment variables
Manually get a ticket for the HTTP/master.example.com principal. Note that this will delete your other tickets, but works fine if you just want to see the webpage
$ kinit -S HTTP/master.example.com testuser
Use curl to authenticate the first time, at which point you’ll already have the proper tickets in your cache, and the browser authentication will just work. Note that your version of curl must support the GSS-API.
$ curl -V # Check your version of curl supports GSS-API curl 7.59.0 (x86_64-apple-darwin17.2.0) libcurl/7.59.0 SecureTransport zlib/1.2.11 Release-Date: 2018-03-14 Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp smb smbs smtp smtps telnet tftp Features: AsynchDNS IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz UnixSockets $ curl --negotiate -u : http://master.example.com:50070 # get a HTTP ticket for master.example.com
After doing one of these, you should be able to access any of the pages from your browser.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file hadoop-test-cluster-0.1.0.tar.gz
.
File metadata
- Download URL: hadoop-test-cluster-0.1.0.tar.gz
- Upload date:
- Size: 7.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.13.0 setuptools/33.1.1 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/2.7.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b72948f296c0537b9800f8b7befb2806bbbd99de3b54656ccb8bdb7be99b4424 |
|
MD5 | e40c2935f0c77675b042840c5a5a0198 |
|
BLAKE2b-256 | c3477cd71ef91b90776b2af0061b5937469b609f1e5c9eb254a470b9876d84b3 |
File details
Details for the file hadoop_test_cluster-0.1.0-py2.py3-none-any.whl
.
File metadata
- Download URL: hadoop_test_cluster-0.1.0-py2.py3-none-any.whl
- Upload date:
- Size: 8.5 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.13.0 setuptools/33.1.1 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/2.7.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 91f272e237e35d2328a435e49357e146f29ac5641663f158445786f10ddee52b |
|
MD5 | da2b95ded520a6807b7ead531d76143c |
|
BLAKE2b-256 | 35dab00b1afc5f343cac01eef31171c228e829e25c85aee8f21a6193165abb49 |