Skip to main content

CDX Client

Project description

CDX Client provides an API library and command-line tools for accessing CDX data. CDX is the Climate Data Exchange, an effort of the Jet Propulsion Laboratory to create a virtual environment for the sharing of climate data.

Installation

This document tells you how to install cdx.client.

Quick Instructions

As a user with administrative privileges, run:

easy_install cdx.client

That’s it.

Full Instructions

cdx.client requires the Python programming language. We recommend version 2.4 or later. As of this writing, 2.6 is the latest stable version. If Python is not yet installed on your system, you can find binary and and source distributions from the Python website.

To test if a correct version of Python is available on your system, run:

python -V

You should see output similar to:

Python 2.6

indicating the version of Python installed. cdx.client also requires Agile OODT. OODT is Object Oriented Data Technology, a framework for metadata and data grids. Agile OODT is a Python version of OODT that supports higher performance and easier integration than the Java version.

By far the easiest, recommended, and encouraged way to install cdx.client is to use EasyInstall. If your Python installation has EasyInstall available to it, then this one command is all you need to run in order to download, build, install, and generate command-line tools all in one go for all users on your system:

easy_install cdx.client

Be sure to run that command as an administrative user. For example, on Mac OS X and other Unix systems, you might need to run:

sudo easy_install cdx.client

That will also download and install all dependencies, including Agile OODT.

Executables

The commands cdxls and cdxget will be generated and placed with your standard installation directory for Python commands. Usually, this is the same location as the python executable itself. For example, on Mac OS X 10.5, the directory is:

/Library/Frameworks/Python.framework/Versions/Current/bin

You may want to add that directory to your shell’s PATH variable, as well as forcing your shell to re-scan the PATH variable for new executables.

Installing EasyInstall

If you happen to be on a system where your Python installation lacks easy install, fret not. Upgrading your system to gain EasyInstall’s abilities is quite simple. Follow these instructions:

  1. Download http://peak.telecommunity.com/dist/ez_setup.py

  2. As an administrative user, run the freshly-downloaded ez_setup.py file using your system’s Python.

EasyInstall and its necessary libraries will be downloaded, built, and installed for you, and the easy_install executable generated. The location of the easy_install executable is as described above.

Installing Without EasyInstall

If EasyInstall is not available on your system, you can still make a proper installation of cdx.client. Follow these instructions:

  1. Download the Agile OODT source distribution from http://oodt.jpl.nasa.gov/dist/agile-oodt/oodt-0.0.1.tar.gz. Substitute version numbers as appropriate.

  2. Download the cdx.client source distribution from http://cdx.jpl.nasa.gov/software/dist/cdx.client-0.0.3.tar.gz. Substitute version numbers as appropriate.

  3. Unpack each archive.

  4. Change the current working directory to each newly-created subdirectory, oodt-0.0.1 and cdx.client-0.0.2, again substituting version numbers as appropriate.

  5. As an administrative user, run: python setup.py install in each subdirectory.

Issues and Questions

To report any problems with or ask for help about cdx.client, visit our contact web page.

Using CDX Client

Installing the CDX Client package makes available three things on your computer:

cdxls command

The cdxls command lets you list the contents of a CDX server from your terminal prompt or a shell script.

cdxget command

The cdxget command lets you retrieve data from CDX from either your terminal prompt or a shell script.

CDX Library

The CDX Library is a Python-based API for using CDX servers.

This document describes how to use the above three items, with special attention to the CDX Library.

Commands

After installing the CDX Client package, two new command are made available on your system, cdxls and cdxget. These commands enable you to list the contents of the data on a CDX server and retrieve selected files from the server.

To use these commands from your interactive prompt, you just need to make sure your shell’s PATH environment variable includes the directory where the commands are installed. On most systems, these two commands are installed in:

/usr/local/bin

However, on Mac OS X, the installation location may be:

/Library/Frameworks/Python.framework/Versions/Current/bin

And on Windows, it may be:

c:\Program Files\Python

Note also that some interactive shells create a cache of commands in order to execute your requests more quickly. You may need to force your shell to re-build that cache. The csh and tcsh shells are two such examples; you can make these shells rebuild their caches by running the rehash command.

Use from Shell Scripts

The cdxls and cdxget commands may be used from shell scripts as well. The only requirement for making these commands available to shell scripts is the same as for interactive sessions: the shell’s PATH environment variable must include the directory that contains the cdxls and cdxget commands.

Here is a sample shell script that retrieves the MLS Aura L2GP data files (and metadata) files for HO2 and HOCl from day 325 in 2008:

#!/bin/sh
PATH=/usr/local/bin:/usr/bin:/bin; export PATH
CDX_SERVER=http://mlscdx.jpl.nasa.gov:8080/cdx/prod; export CDX_SERVER

for kind in HO2 HOCl; do
    for extension in he5 he5.met; do
        cdxget 2008/325/MLS-Aura_L2GP-${kind}_v02-23-c01_2008d325.${extension}
    done
done

The above shell script assumes that cdxget will be found in /usr/local/bin, /usr/bin, or /bin. It also sets the CDX_SERVER environment variable to set what CDX server to talk to. It then loops through two kinds of data (HO2 and HOCl), and loops through two kinds of file extensions (he5 and he5.met). The results is it retrieves four files to the current working directory, specifically:

  • 2008/325/MLS-Aura_L2GP-HO2_v02-23-c01_2008d325.he5

  • 2008/325/MLS-Aura_L2GP-HO2_v02-23-c01_2008d325.he5.met

  • 2008/325/MLS-Aura_L2GP-HOCl_v02-23-c01_2008d325.he5

  • 2008/325/MLS-Aura_L2GP-HOCl_v02-23-c01_2008d325.he5.met

The cdxsubset command may also be used from a shell script. It is configured by two environment variables:

  • CDX_SUBSET_MODE - if set then local data wrapper mode will be used (remote is assumed as default)

  • CDX_SERVER - set to the product server to talk to for subsetting

Some example working commands are:

Subset spatial bounding box from NCAR CCSM model output:

cdxsubset -b /esg/data18/commit/atm/da/hfls/ncar_ccsm3_0/run1/hfls_A2.Commit_1.CCSM.atmd.2000-01-01_cat_2039-12-31.nc

Subset time range from NCAR CCSM model output:

cdxsubset -t /esg/data18/commit/atm/da/hfls/ncar_ccsm3_0/run1/hfls_A2.Commit_1.CCSM.atmd.2000-01-01_cat_2039-12-31.nc

Get time array variable data from the MLS L2 granule:

cdxsubset -p Time /mls/2005/100/MLS-Aura_L2GP-BrO_v01-51-c01_2005d100.he5

Get spatial bounding box from AIRS level 2 granule:

cdxsubset -b /airs/data/s4pa/Aqua_AIRS_Level2/AIRX2RET.003/2007/005/AIRS.2007.01.05.240.L2.RetStd.v4.0.9.0.G07007180718.hdf

Subset by lat lon and variable for an AIRS level 2 granule:

cdxsubset -p TAirStd --latitude-range=67.35:78.40 -longitude-range=172.226:176.10 /airs/2009/01/01/airx2ret/AIRS.2009.01.01.001.L2.RetStd.v5.2.2.0.G09002135510.hdf

CDX Library

The CDX Library is a Python-based application programming interface (API) for communicating with CDX servers. In fact, the two commands cdxls and cdxget are implemented using the CDX Library. If shell-script programming is not to your taste, and you know Python, then using the CDX Library may be right for you.

The CDX Library uses an object-oriented approach to model the contents of a CDX server. Objects represent CDX files and directories, and you call methods on those objects to determine file attributes, directory contents, or retrieve a file’s contents.

The remainder of this document describes the modules, classes, and functions that comprise the CDX Library. If you don’t know Python, you may wish to skip the rest.

The cdx Module

The cdx module is a namespace module. It provides no classes or functions. Rather, it contains a single, nested module called client.

The cdx.client Module

The cdx.client module contains nested modules that provide the CDX Library. It also contains implementations of the cdxls and cdxget commands.

The cdx.client.cdxfile Module

The cdx.client.cdxfile module is where all the action is. It contains classes and functions for communicating with and modeling the contents of CDX servers. It contains the following items:

CDXDirectory

Objects of this class represent directories on a CDX server. You can use Python’s iterator, length, and containment protocols to examine the contents of the directory. They can also be sorted.

CDXFile

Objects of this class represent files on a CDX server. While you can instantiate objects of this class, you’d typically instantiate a CDXDirectory and examine its contents which will include CDXFile objects for files in the directory and nested CDXDirectory objects for subdirectories. A CDXFile object also provides a method to let you retrieve its data.

findFile

The findFile function is a utility function that, given a starting CDXDirectory and a path name, yields the matching CDXDirectory or CDXFile on a CDX server.

CDXDirectory Objects

CDXDirectory objects represent directories in a CDX server. You can create these objects directly or you can use the findFile method in the cdx.client.cdxfile module.

class CDXDirectory(path, cdxURL = None)

Create a CDXDirectory object with the given path. You can also specify the URL to a CDX server to use by passing in a string for cdxURL.

sort(cmp = cmp, reverse = False)

Return the contents of the directory, sorted, using the a comparison function cmp, defaulting to Python’s built-in cmp. If reverse is True, reverse the order of the sort. Comparison with cmp on CDXFile and CDXDirectory objects is by CDX server URL and by name. You can pass in your own cmp that, for example, sorts by file size.

isFile()

Always returns False.

path

The path name of the directory.

name

The name of the directory; this is the last element of the path.

size

By convention sizes for directories are always zero.

CDXDirectory objects obey Python’s protocols for hashing, comparison, containment testing, iteration, indexing, and length query. Containment testing with directories with with CDXDirectory objects, CDXFile objects, or plain strings:

>>> from cdx.client.cdxfile import CDXDirectory
>>> root = CDXDirectory('/', 'http://localhost:8192/cdx/prod')
>>> len(root)
3
>>> subdir = root['2005']
>>> subdir
CDXDirectory(path=/2005)
>>> subdir in root
True
>>> '2005' in root
True
>>> subdir < root
False
>>> subdir > root
True
>>> for i in root:
...     print i
...
/2008
/2007
/2005
>>> root.sort()
[CDXDirectory(path=/2005), CDXDirectory(path=/2007), CDXDirectory(path=/2008)]
CDXFile Objects

TBD.

Changelog

1.3.1 - 10/19/11

This release includes an updated version of the datawrappers package (0.0.8) as specified in CDX-122.

For the issue tracker, see https://oodt.jpl.nasa.gov/jira/browse/CDX

1.3.0 - 10/17/11

This release fixes an important issue in the CDX regrid service and computes an average of the running sum of data points indexed by cube cell. See CDX-118 for more information.

For the issue tracker, see https://oodt.jpl.nasa.gov/jira/browse/CDX

1.2.0 - 06/27/11

This release provides tight integration with the ESG, and plugs into its security infrastructure and adds a bunch of virtual roots for use in cdxregrid and cdxsubset for ESG data. See CDX-110 and CDX-111 for more information.

For the issue tracker, see http://oodt.jpl.nasa.gov/jira/browse/CDX.

1.1.0 - 11/13/2010

This release incorporates version 0.0.6 of the cdx.datawrappers package which includes CDX-103, which implements GetVariable by lat, lon, and time. In turn this release also provides CDX-102, which incoprorates this functionality into the basic cdxregrid functionality. At this point, cdxregrid is pretty much fully functional.

For the issue tracker, see http://oodt.jpl.nasa.gov/jira/browse/CDX.

1.0.0 - 09/10/10

This release made a minor improvement to the public cdxls API, exposing the set of found and lost files from the listFiles function, taking away its function-local orchestration and exposing the lists of found and notfound files to the user. See CDX-93 for more information. This release additionally exposes the CDX MODIS product server via cdxls. See CDX-98 for more details. Finally, this release includes updates to fix CloudSat as a cdxsubset source, as described in CDX-99.

For the issue tracker, see http://oodt.jpl.nasa.gov/jira/browse/CDX.

0.0.9 - 03/24/2010

This release includes improvements to cdxsubset, specifically the ability to print out the full numpy array returned from a DataWrapper. See CDX-82 for specific details. Additionally cdxsubset has been updated to expose the subset by LatLon functionality per CDX-84 and CDX-85. Subset by range query allowing constraints to be specified was also included in this release (see CDX-86 for more information).

For the issue tracker, see http://oodt.jpl.nasa.gov/jira/browse/CDX.

0.0.8 - Inclusion of improvements to cdxcd, virtual roots and new tools

This release includes improvements to cdxcd to make it work nicely with cdx virtual roots, and includes integration with the other cdx client toolkit including cdxls, cdxsubset and cdxget. See CDX-70 and CDX-71 for further details.

For the issue tracker, see http://oodt.jpl.nasa.gov/jira/browse/CDX.

0.0.7 - Add Resource Files

Release 0.0.6 was mis-configured and didn’t include some important resource files. This emergency release includes them!

0.0.6 - Inclusion of cdxsubset and other tools, and some minor bug fixes

This release includes the cdxsubset tool, as described in CDX-56. This release also includes the cdxcd tool, as described in CDX-69. This release also includes minor aestetic bug fixes that address pathing issues in cdxls, e.g., CDX-29.

For the issue tracker, see http://oodt.jpl.nasa.gov/jira/browse/CDX.

0.0.5 - Repaired Unit Tests

This release updates the unit tests and test data based on the changes in 0.0.4 and the new behavior of actual product servers. In addition, it fixes some documentation problems (incorrect package name cdx-client which should’ve been cdx.client) in the INSTALL.txt file.

The sole bug report addressed in this release is CDX-45, “Unit tests in cdx-client failing”. For the issue tracker, see http://oodt.jpl.nasa.gov/jira/browse/CDX.

0.0.4 - Bugfix to 0.0.3 release

This is a bugfix release to 0.0.3, which includes some error checking to deal with some data format inconsistencies on the OODT OFSN product server end.

JIRA issues addressed (see http://oodt.jpl.nasa.gov/jira/browse/CDX):

  • CDX-43 Directory structure shouldn’t be preserved if cdxget is called without the -r parameter

  • CDX-42 cdxget -r fails to retrieve MLS data

  • CDX-41 cdxls -R chokes if dir size not provided

0.0.3 - Directory caching

The major feature of this release is the cdx.client.dircache module which enables local-disk caching of a subset of a remote CDX product server’s contents. It also introduces the concept of a cdx: scheme URL. Such a URL has this form:

cdx://hostname[:port]/endpoint/prod/path/to/a/directory

where hostname is the name or IP address of a CDX product server, port is an optional port number on which the server is listening, endpoint is the WebGrid service identifier (typically just the string cdx), prod is the fixed keyword prod, and path/to/a/directory is an absolute path to a directory within that product server.

Such caching is intended to support the CCMValDiag software.

0.0.2 - Bug fix for cdxls

This release repairs a bug in cdxls that caused directories with only one item in them to not be listed properly.

0.0.1 - URL specification

This release provides support for a (-u url, –url=url) pair of command-line options that enable specification of a specific URL to use, falling back to the URL specified in the CDX_SERVER environment variable (and, if that’s unset, then http://mlscdx.jpl.nasa.gov:8080/cdx/prod). This supports two ideas suggested in CDX-16 (the first two, not the third with a cdx: style URL).

0.0.0 - Initial

This is an initial release of cdx-client supporting minimal cdxls and cdxget function.

Project details


Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page