CDX Client
Project description
CDX Client provides an API library and command-line tools for accessing CDX data. CDX is the Climate Data Exchange, an effort of the Jet Propulsion Laboratory to create a virtual environment for the sharing of climate data.
Installation
This document tells you how to install cdx-client.
Quick Instructions
As a user with administrative privileges, run:
easy_install cdx-client
That’s it.
Full Instructions
cdx-client requires the Python programming language. We recommend version 2.4 or later. As of this writing, 2.6 is the latest stable version. If Python is not yet installed on your system, you can find binary and and source distributions from the Python website.
To test if a correct version of Python is available on your system, run:
python -V
You should see output similar to:
Python 2.6
indicating the version of Python installed. cdx-client also requires Agile OODT. OODT is Object Oriented Data Technology, a framework for metadata and data grids. Agile OODT is a Python version of OODT that supports higher performance and easier integration than the Java version.
By far the easiest, recommended, and encouraged way to install cdx-client is to use EasyInstall. If your Python installation has EasyInstall available to it, then this one command is all you need to run in order to download, build, install, and generate command-line tools all in one go for all users on your system:
easy_install cdx-client
Be sure to run that command as an administrative user. For example, on Mac OS X and other Unix systems, you might need to run:
sudo easy_install cdx-client
That will also download and install all dependencies, including Agile OODT.
Executables
The commands cdxls and cdxget will be generated and placed with your standard installation directory for Python commands. Usually, this is the same location as the python executable itself. For example, on Mac OS X 10.5, the directory is:
/Library/Frameworks/Python.framework/Versions/Current/bin
You may want to add that directory to your shell’s PATH variable, as well as forcing your shell to re-scan the PATH variable for new executables.
Installing EasyInstall
If you happen to be on a system where your Python installation lacks easy install, fret not. Upgrading your system to gain EasyInstall’s abilities is quite simple. Follow these instructions:
As an administrative user, run the freshly-downloaded ez_setup.py file using your system’s Python.
EasyInstall and its necessary libraries will be downloaded, built, and installed for you, and the easy_install executable generated. The location of the easy_install executable is as described above.
Installing Without EasyInstall
If EasyInstall is not available on your system, you can still make a proper installation of cdx-client. Follow these instructions:
Download the Agile OODT source distribution from http://oodt.jpl.nasa.gov/dist/agile-oodt/oodt-0.0.1.tar.gz. Substitute version numbers as appropriate.
Download the cdx-client source distribution from http://cdx.jpl.nasa.gov/software/dist/cdx.client-0.0.3.tar.gz. Substitute version numbers as appropriate.
Unpack each archive.
Change the current working directory to each newly-created subdirectory, oodt-0.0.1 and cdx.client-0.0.2, again substituting version numbers as appropriate.
As an administrative user, run: python setup.py install in each subdirectory.
Issues and Questions
To report any problems with or ask for help about cdx-client, visit our contact web page.
Using CDX Client
Installing the CDX Client package makes available three things on your computer:
- cdxls command
The cdxls command lets you list the contents of a CDX server from your terminal prompt or a shell script.
- cdxget command
The cdxget command lets you retrieve data from CDX from either your terminal prompt or a shell script.
- CDX Library
The CDX Library is a Python-based API for using CDX servers.
This document describes how to use the above three items, with special attention to the CDX Library.
Commands
After installing the CDX Client package, two new command are made available on your system, cdxls and cdxget. These commands enable you to list the contents of the data on a CDX server and retrieve selected files from the server.
To use these commands from your interactive prompt, you just need to make sure your shell’s PATH environment variable includes the directory where the commands are installed. On most systems, these two commands are installed in:
/usr/local/bin
However, on Mac OS X, the installation location may be:
/Library/Frameworks/Python.framework/Versions/Current/bin
And on Windows, it may be:
c:\Program Files\Python
Note also that some interactive shells create a cache of commands in order to execute your requests more quickly. You may need to force your shell to re-build that cache. The csh and tcsh shells are two such examples; you can make these shells rebuild their caches by running the rehash command.
Use from Shell Scripts
The cdxls and cdxget commands may be used from shell scripts as well. The only requirement for making these commands available to shell scripts is the same as for interactive sessions: the shell’s PATH environment variable must include the directory that contains the cdxls and cdxget commands.
Here is a sample shell script that retrieves the MLS Aura L2GP data files (and metadata) files for HO2 and HOCl from day 325 in 2008:
#!/bin/sh PATH=/usr/local/bin:/usr/bin:/bin; export PATH CDX_SERVER=http://mlscdx.jpl.nasa.gov:8080/cdx/prod; export CDX_SERVER for kind in HO2 HOCl; do for extension in he5 he5.met; do cdxget 2008/325/MLS-Aura_L2GP-${kind}_v02-23-c01_2008d325.${extension} done done
The above shell script assumes that cdxget will be found in /usr/local/bin, /usr/bin, or /bin. It also sets the CDX_SERVER environment variable to set what CDX server to talk to. It then loops through two kinds of data (HO2 and HOCl), and loops through two kinds of file extensions (he5 and he5.met). The results is it retrieves four files to the current working directory, specifically:
2008/325/MLS-Aura_L2GP-HO2_v02-23-c01_2008d325.he5
2008/325/MLS-Aura_L2GP-HO2_v02-23-c01_2008d325.he5.met
2008/325/MLS-Aura_L2GP-HOCl_v02-23-c01_2008d325.he5
2008/325/MLS-Aura_L2GP-HOCl_v02-23-c01_2008d325.he5.met
CDX Library
The CDX Library is a Python-based application programming interface (API) for communicating with CDX servers. In fact, the two commands cdxls and cdxget are implemented using the CDX Library. If shell-script programming is not to your taste, and you know Python, then using the CDX Library may be right for you.
The CDX Library uses an object-oriented approach to model the contents of a CDX server. Objects represent CDX files and directories, and you call methods on those objects to determine file attributes, directory contents, or retrieve a file’s contents.
The remainder of this document describes the modules, classes, and functions that comprise the CDX Library. If you don’t know Python, you may wish to skip the rest.
The cdx Module
The cdx module is a namespace module. It provides no classes or functions. Rather, it contains a single, nested module called client.
The cdx.client Module
The cdx.client module contains nested modules that provide the CDX Library. It also contains implementations of the cdxls and cdxget commands.
The cdx.client.cdxfile Module
The cdx.client.cdxfile module is where all the action is. It contains classes and functions for communicating with and modeling the contents of CDX servers. It contains the following items:
- CDXDirectory
Objects of this class represent directories on a CDX server. You can use Python’s iterator, length, and containment protocols to examine the contents of the directory. They can also be sorted.
- CDXFile
Objects of this class represent files on a CDX server. While you can instantiate objects of this class, you’d typically instantiate a CDXDirectory and examine its contents which will include CDXFile objects for files in the directory and nested CDXDirectory objects for subdirectories. A CDXFile object also provides a method to let you retrieve its data.
- findFile
The findFile function is a utility function that, given a starting CDXDirectory and a path name, yields the matching CDXDirectory or CDXFile on a CDX server.
CDXDirectory Objects
CDXDirectory objects represent directories in a CDX server. You can create these objects directly or you can use the findFile method in the cdx.client.cdxfile module.
- class CDXDirectory(path, cdxURL = None)
Create a CDXDirectory object with the given path. You can also specify the URL to a CDX server to use by passing in a string for cdxURL.
- sort(cmp = cmp, reverse = False)
Return the contents of the directory, sorted, using the a comparison function cmp, defaulting to Python’s built-in cmp. If reverse is True, reverse the order of the sort. Comparison with cmp on CDXFile and CDXDirectory objects is by CDX server URL and by name. You can pass in your own cmp that, for example, sorts by file size.
- isFile()
Always returns False.
- path
The path name of the directory.
- name
The name of the directory; this is the last element of the path.
- size
By convention sizes for directories are always zero.
CDXDirectory objects obey Python’s protocols for hashing, comparison, containment testing, iteration, indexing, and length query. Containment testing with directories with with CDXDirectory objects, CDXFile objects, or plain strings:
>>> from cdx.client.cdxfile import CDXDirectory >>> root = CDXDirectory('/', 'http://localhost:8192/cdx/prod') >>> len(root) 3 >>> subdir = root['2005'] >>> subdir CDXDirectory(path=/2005) >>> subdir in root True >>> '2005' in root True >>> subdir < root False >>> subdir > root True >>> for i in root: ... print i ... /2008 /2007 /2005 >>> root.sort() [CDXDirectory(path=/2005), CDXDirectory(path=/2007), CDXDirectory(path=/2008)]
CDXFile Objects
TBD.
Changelog
0.0.4 - Bugfix to 0.0.3 release
This is a bugfix release to 0.0.3, which includes some error checking to deal with some data format inconsistencies on the OODT OFSN product server end.
JIRA issues addressed (see http://oodt.jpl.nasa.gov/jira/browse/CDX):
CDX-43 Directory structure shouldn’t be preserved if cdxget is called without the -r parameter CDX-42 cdxget -r fails to retrieve MLS data CDX-41 cdxls -R chokes if dir size not provided
0.0.3 - Directory caching
The major feature of this release is the cdx.client.dircache module which enables local-disk caching of a subset of a remote CDX product server’s contents. It also introduces the concept of a cdx: scheme URL. Such a URL has this form:
cdx://hostname[:port]/endpoint/prod/path/to/a/directory
where hostname is the name or IP address of a CDX product server, port is an optional port number on which the server is listening, endpoint is the WebGrid service identifier (typically just the string cdx), prod is the fixed keyword prod, and path/to/a/directory is an absolute path to a directory within that product server.
Such caching is intended to support the CCMValDiag software.
0.0.2 - Bug fix for cdxls
This release repairs a bug in cdxls that caused directories with only one item in them to not be listed properly.
0.0.1 - URL specification
This release provides support for a (-u url, –url=url) pair of command-line options that enable specification of a specific URL to use, falling back to the URL specified in the CDX_SERVER environment variable (and, if that’s unset, then http://mlscdx.jpl.nasa.gov:8080/cdx/prod). This supports two ideas suggested in CDX-16 (the first two, not the third with a cdx: style URL).
0.0.0 - Initial
This is an initial release of cdx-client supporting minimal cdxls and cdxget function.