Skip to main content

Python utilities for creating an ElasticSearch database containing information on available CLI modules

Project description

The files in this repository allow you to create an Elasticsearch database containing information on available CLI modules. The idea is that we have a public Kibana dashboard listing CLI modules from multiple sources, so there is a script with two modes:

‘extract’ mode

Extracts JSON descriptions from a set of CLI modules (in one or more common directories).

# ./ctk_cli_indexer.py extract --help
usage: ctk_cli_indexer.py extract [-h] [--json_filename JSON_FILENAME]
                                  base_directory [base_directory ...]

positional arguments:
  base_directory        directories (at least one) in which to search for CLI
                        module executables, or direct paths to executables

optional arguments:
  -h, --help            show this help message and exit
  --json_filename JSON_FILENAME, -o JSON_FILENAME

This is to be run by the administrators of sites that offer CLI modules, and the idea is that the resulting .json files are published on some website.

‘index’ mode

Takes a JSON file (or a list of CLI executables) and updates an Elasticsearch database. An identifier for the source of the CLI modules is passed as first parameter, and the script takes care to delete old documents in the database (CLIs that got removed), and will also maintain timestamps of the last change of each CLI (i.e. not re-upload stuff that did not change, as well as mark each change with the modification time of the CLI executable that introduced the change). Instead of passing a JSON file, you may also pass a list of directories or CLI executables directly.

# ./ctk_cli_indexer.py index --help
usage: ctk_cli_indexer.py index [-h] [--host HOST] [--port PORT]
                                source_name path [path ...]

positional arguments:
  source_name  identifier for the source (e.g. 'slicer' or 'nifty-reg') of
               this set of CLI modules (will be used to remove old documents
               from this source from the Elasticsearch index if they are no
               longer present)
  path         one or more directories in which to search for CLI module
               executables, paths to CLI executables, or (exactly one) JSON
               file as created by `extract` subcommand

optional arguments:
  -h, --help   show this help message and exit
  --host HOST  hostname elasticsearch is listening on (default: localhost)
  --port PORT  port elasticsearch is listening on (default: 9200)

This script should be run by a cron job (i.e. setup by a CTK administrator), from a script that pulls the above-mentioned .json URLs regularly and updates a central database. A Kibana dashboard will then give interested people an overview over the available modules from multiple sites.

System Prerequisites

The following software packages are required to be installed on your system:

  • Python

  • pip (recommended for installation)

  • Git (for developer only)

The following python packages will be automatically installed if not present (see requirements.txt, listed here in case you prefer to install them via your system’s package manager):

Installation for user

Use pip (or easy_install) for installation from pypi:

pip install ctk-cli-indexer

Installation for developer

First download the source:

git clone git://github.com/commontk/ctk-cli-indexer.git

To use the module, you must install some external python package dependencies:

cd ctk-cli-indexer
pip install -r requirements.txt

Elasticsearch Setup

In order to use this code, you must have access to a running Elasticsearch server. This section shall give just the basic instructions for getting started. First, download the latest stable elasticsearch and kibana tarballs (logstash is not necessary / used here).

Elasticsearch is written in Java, so you can basically unpack the tarball and run bin/elasticsearch, and the server should be running on http://localhost:9200/ (yes, you can just try that URL in the browser, and you should get some status JSON). This default location is also built into the indexer script, so you may immediately start indexing. One may use http://localhost:9200/cli/cli/_search?pretty=1 to check whether there is data in the index.

Kibana is a purely browser-based web application (based on client-side HTML and JS), so you can serve the files using any kind of HTTP server, e.g.:

cd kibana-3.1.1/
python -m SimpleHTTPServer

which will serve Kibana on http://localhost:8000/ You may even be able to use Kibana without any HTTP server, just by opening kibana-x.y.z/index.html within your browser. In that case, you may want to edit config.js to point to the server like this:

elasticsearch: "http://localhost:9200",

That’s it! If you see the welcome dashboard in the browser, you’re all set. Note that you can even store dashboards within Kibana; by default, they will be stored within Elasticsearch, so you don’t even have to care about filesystem access.

First Steps with Kibana

I suggest to start with a blank dashboard (link at the bottom of the default dashboard). Start by going to the dashboard settings (cog in the upper right corner) and under “Index”, select ‘cli’ as default index and enable autocompletion under “Preload Fields”.

Next, add rows (“Rows” tab in the dashboard settings), for instance, one with 200px height, one with 300px, and a third with 500px. Don’t forget to press “Create Row” for each row (in particular, also for the last one), then press “Save”.

Within each row, there is an (invisible) 12-column layout, so you want to add “widgets” now that span either 3 or 4 such columns. Start with “Terms” widgets only, try different fields (e.g. “license”), and different view options (in particular, the bar/pie/table styles).

The widgets allow interactive filtering, e.g. click on a specific term to filter by license / author / source / category; active filters will be listed and can be cleared at the top (make sure that line is not collapsed). There is also a search row where you can try entering keywords.

The last row (which we made particularly high) was intended for a “Table” widget (like on the sample dashboard), which can be used to list all matching documents.

Now play around with the various options, and don’t forget to save your dashboard (floppy symbol near the upper right corner). If you enable “Save to > Export” and “Load from > Local file” under “Controls” in the dashboard settings, you can also download/upload the dashboard as JSON. Furthermore, you can make the dashboard your default/home dashboard. Within this repository, you also find an example dashboard in the file cli_dashboard.json.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ctk-cli-indexer-0.6.tar.gz (8.2 kB view details)

Uploaded Source

File details

Details for the file ctk-cli-indexer-0.6.tar.gz.

File metadata

File hashes

Hashes for ctk-cli-indexer-0.6.tar.gz
Algorithm Hash digest
SHA256 529d0e63c6b6a6215a7c407d1c270ccde2eaf9080042b2a5834762603af5c0c4
MD5 be5d78129224757aae96f0368c51f07d
BLAKE2b-256 34faa2635978fabe221ca2b40f15c3763f80f33ff1d744a8ad32182dca71b3c6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page