Skip to main content

A CLI for managing hadoop clusters for testing

Project description

pypi base kerberos

A dockerized setup for testing code on a hadoop cluster.

Installation

hadoop-test-cluster is available on PyPI:

$ pip install hadoop-test-cluster

You can also install from source on github:

$ pip install git+https://github.com/jcrist/hadoop-test-cluster.git

Overview

For testing purposes, infrastructure for setting up a mini hadoop cluster using docker is provided here. Two setups are provided:

  • base: uses simple authentication (unix user permissions)

  • kerberos: uses kerberos for authentication

Each cluster has three containers:

  • One master node running the hdfs-namenode and yarn-resourcemanager (in the kerberos setup, the kerberos daemons also run here).

  • One worker node running the hdfs-datanode and yarn-nodemanager

  • One edge node for interacting with the cluster

One user account has also been created for testing purposes:

  • Login: testuser

  • Password: testpass

For the kerberos setup, a keytab for this user has been put at /home/testuser/testuser.keytab, so you can kinit easily like kinit -kt /home/testuser/testuser.keytab testuser

An admin kerberos principal has also been created for use with kadmin:

  • Login: admin/admin

  • Password: adminpass

Ports

The full address is dependent on the IP address of your docker-machine driver, which can be found at:

$ docker-machine inspect --format {{.Driver.IPAddress}})
  • NameNode RPC: 9000

  • NameNode Webui: 50070

  • ResourceManager Webui: 8088

  • Kerberos KDC: 88

  • Kerberos Kadmin: 749

  • DataNode Webui: 50075

  • NodeManager Webui: 8042

The htcluster commandline tool

To work with either cluster, please use the htcluster tool. This is a thin wrapper around docker-compose, with utilities for quickly doing most common actions.

$ htcluster --help
usage: htcluster [--help] [--version] command ...

Manage hadoop test clusters

positional arguments:
command
    startup   Start up a hadoop cluster.
    login     Login to a node in the cluster.
    exec      Execute a command on the node as a user
    shutdown  Shutdown the cluster and remove the containers.
    compose   Forward commands to docker-compose
    kerbenv   Output environment variables to setup kerberos locally. Intended
              use is to eval the output in bash: eval $(htcluster kerbenv)

optional arguments:
--help, -h  Show this help message then exit
--version   Show version then exit

Starting a cluster

$ htcluster startup --image base

Starting a cluster, mounting the current directory to ~/workdir

$ htcluster startup --image base --mount .:workdir

Login to the edge node

$ htcluster login

Run a commmand as the user on the edge node

$ htcluster exec -- myscript.sh some other args

Shutdown the cluster

$ htcluster shutdown

Authenticating with Kerberos from outside Docker

In the kerberized cluster, the webui’s are secured by kerberos, and so won’t be accessible from your browser unless you configure kerberos properly. This is doable, but takes a few steps:

  1. Kerberos/SPNEGO requires that the requested url matches the hosts domain. The easiest way to do this is to modify your /etc/hosts and add a line for master.example.com:

    # Add a line to /etc/hosts pointing master.example.com to your docker-machine
    # driver ip address.
    # Note that you probably need to run this command as a super user.
    $ echo "$(docker-machine inspect --format {{.Driver.IPAddress}})  master.example.com" >> /etc/hosts
  2. You must have kinit installed locally. You may already have it, otherwise it’s available through most package managers.

  3. You need to tell kerberos where the krb5.conf is for this domain. This is done with an environment variable. To make this easy, htcluster has a command to do this:

    $ eval $(htcluster kerbenv)
  4. At this point you should be able to kinit as testuser:

    $ kinit testuser@EXAMPLE.COM
  5. To access kerberos secured pages in your browser you’ll need to do a bit of (simple) configuration. See [this documentation from Cloudera](https://www.cloudera.com/documentation/enterprise/5-9-x/topics/cdh_sg_browser_access_kerberos_protected_url.html) for information on what’s needed for your browser.

  6. Since environment variables are only available for processes started in the environment, you have three options here:

    • Restart your browser from the shell in which you added the environment variables

    • Manually get a ticket for the HTTP/master.example.com principal. Note that this will delete your other tickets, but works fine if you just want to see the webpage

      $ kinit -S HTTP/master.example.com testuser
    • Use curl to authenticate the first time, at which point you’ll already have the proper tickets in your cache, and the browser authentication will just work. Note that your version of curl must support the GSS-API.

      $ curl -V  # Check your version of curl supports GSS-API
      curl 7.59.0 (x86_64-apple-darwin17.2.0) libcurl/7.59.0 SecureTransport zlib/1.2.11
      Release-Date: 2018-03-14
      Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp smb smbs smtp smtps telnet tftp
      Features: AsynchDNS IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz UnixSockets
      
      $ curl --negotiate -u : http://master.example.com:50070  # get a HTTP ticket for master.example.com

    After doing one of these, you should be able to access any of the pages from your browser.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hadoop-test-cluster-0.0.4.tar.gz (7.4 kB view details)

Uploaded Source

Built Distribution

hadoop_test_cluster-0.0.4-py2.py3-none-any.whl (8.0 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file hadoop-test-cluster-0.0.4.tar.gz.

File metadata

  • Download URL: hadoop-test-cluster-0.0.4.tar.gz
  • Upload date:
  • Size: 7.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.13.0 setuptools/33.1.1 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/2.7.11

File hashes

Hashes for hadoop-test-cluster-0.0.4.tar.gz
Algorithm Hash digest
SHA256 3f0094789a8f4fd06bea548eb73196d1e3df5931eb63b66a40b6e3f235c3af6a
MD5 cbd0b20d24e717fd6c982ec2e0be132c
BLAKE2b-256 7ac4315ac7b11d6e8404f149d8ac2b7016bf0253b49b1f4eda5b29d2aaf5de63

See more details on using hashes here.

File details

Details for the file hadoop_test_cluster-0.0.4-py2.py3-none-any.whl.

File metadata

  • Download URL: hadoop_test_cluster-0.0.4-py2.py3-none-any.whl
  • Upload date:
  • Size: 8.0 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.13.0 setuptools/33.1.1 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/2.7.11

File hashes

Hashes for hadoop_test_cluster-0.0.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 ecc7dab141ccdf9b3bbf85f884f1d614171dd1d7331e695bed5734b7272ff488
MD5 91ab34e4b6ba0ae1e1d0129003773fb4
BLAKE2b-256 bd0d5cd8a51c30304a53c6aa357af70e831f10b6e16bfcae9344a540ec146baa

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page