Skip to main content

A simple python library to apply NcML logic to NetCDF files

Project description

# pyncml

#### A simple python library to apply NcML logic to NetCDF files


## Installation

##### Stable

pip install pyncml

##### Development

pip install git+https://github.com/kwilcox/pyncml.git

## Supported

* Adding things
* Attributes: `<attribute name="some_new_attribute" type="string" value="some_standard_name" />`

* Renaming things
* Variables: `<variable name="new_var" orgName="old_var" />`
* Attributes: `<attribute name="new_attr" orgName="old_attr" />`
* Dimensions: `<dimension name="new_dim" orgName="old_dim" />`

* Removing things
* Variables: `<remove name="some_variable" type="variable" />`
* Attributes: `<remove name="some_variable" type="variable" />`

* Aggregating things
* Scans: `<scan location="some_directory/foo/bar/" suffix=".nc" subdirs="true" />`

## Not supported

* Adding variables (could be implemented in the future)
* Groups (could be implemented in the future)
* Setting actual data values on variables (could be implemented in the future)
* Creating a file from scratch (could be implemented in the future)
* Removing Dimensions (not implemented in the C library)
* Aggregation scans that utilize the `dateFormatMark` attribute (most likely will never be implemented)

## Usage

### Apply

The `apply` function takes in a path to the `input_file` NetCDF file, an `ncml` object (string, file path, or python etree object), and an optional `output_file`. **If an output_file is not specified, the `input_file` will be edited in place**. The object returned from the `apply` function is a netcdf4-python object, ready to be used.

Any `location` attributes in the NcML are **ignored** and the NcML is applied against the file specified as the `input_file`.

##### Editing a file in place
```python
netcdf = '/some/file/path/in.nc'
ncml = '/some/file/path/foo.ncml'
import pyncml
nc = pyncml.apply(input_file=netcdf, ncml=ncml)
```

##### Using an NcML file
```python
netcdf = '/some/file/path/in.nc'
out = '/some/file/path/out.nc'
ncml = '/some/file/path/foo.ncml'
import pyncml
nc = pyncml.apply(input_file=netcdf, ncml=ncml, output_file=out)
```

##### Using an NcML string
```python
netcdf = '/some/file/path/in.nc'
out = '/some/file/path/out.nc'
ncml = """<?xml version="1.0" encoding="UTF-8"?>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<attribute name="new_attribute" value="works" />
<attribute name="new_history" orgName="history" />
<attribute name="new_file_format" orgName="file_format" value="New Format" />
<remove name="source" type="attribute" />
</netcdf>
"""
import pyncml
nc = pyncml.apply(input_file=netcdf, ncml=ncml, output_file=out)
```

##### Using an `etree` object
```python
import pyncml
netcdf = '/some/file/path/in.nc'
out = '/some/file/path/out.nc'
ncml = pyncml.etree.fromstring("""<?xml version="1.0" encoding="UTF-8"?>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<attribute name="new_attribute" value="works" />
</netcdf>
""")
nc = pyncml.apply(input_file=netcdf, ncml=ncml, output_file=out)
```

### Scan

The `scan` function takes in a path to an `ncml` object (string, file path, or python etree object). The object returned from the `scan` function is a metadata object describing the scan aggregation **it is not a netcdf4-python object of the aggregation**. You can create a `netcdf4-python` object from the scan aggregation (example below).


##### Obtaining aggregation metadata
```python
ncml = '/some/file/path/foo.ncml'
import pyncml
agg = pyncml.scan(ncml=ncml)

print agg.starting
2014-06-20 00:00:00+00:00

print agg.ending
2014-07-19 23:00:00+00:00

print agg.timevar_name
u'time'

print agg.standard_names
[
u'time',
u'projection_y_coordinate',
u'projection_x_coordinate',
u'eastward_wind_velocity'
]

print agg.members # These are already sorted by the 'starting' date
[
{
'starting': datetime.datetime(2014, 6, 20, 0, 0, tzinfo=<UTC>),
'ending': datetime.datetime(2014, 6, 20, 0, 0, tzinfo=<UTC>),
'path': '/path/to/aggregation/defined/in/ncml/first_member.nc'
'standard_names': [u'time',
u'projection_y_coordinate',
u'projection_x_coordinate',
u'eastward_wind_velocity'],
},
{
'starting': datetime.datetime(2014, 6, 20, 1, 0, tzinfo=<UTC>),
'ending': datetime.datetime(2014, 6, 20, 1, 0, tzinfo=<UTC>),
'path': '/path/to/aggregation/defined/in/ncml/second_member.nc'
'standard_names': [u'time',
u'projection_y_coordinate',
u'projection_x_coordinate',
u'eastward_wind_velocity'],
},
...
]
```


##### Creating `netcdf4-python` Aggregation object

<sup>**Note: This will not work with aggregations whose members overlap in time!**</sup>

```python
ncml = '/some/file/path/foo.ncml'
import pyncml
agg = pyncml.scan(ncml=ncml)
files = [ f.path for f in agg.members ]
agg = netCDF4.MFDataset(files)
time = agg.variables.get(agg.timevar_name)

print time
<class 'netCDF4._Variable'>
float64 time('time',)
long_name: date time
units: hours since 1970-01-01 00:00:00
_CoordinateAxisType: Time
unlimited dimensions = ('time',)
current size = (14,)

print time[:]
[ 389784. 389785. 389786. 389787. 389788. 389789. 389790. 389791.
389792. 389793. 390500. 390501. 390502. 390503.]

print netCDF4.num2date(time[:], units=time.units)
[datetime.datetime(2014, 6, 20, 0, 0) datetime.datetime(2014, 6, 20, 1, 0)
datetime.datetime(2014, 6, 20, 2, 0) datetime.datetime(2014, 6, 20, 3, 0)
datetime.datetime(2014, 6, 20, 4, 0) datetime.datetime(2014, 6, 20, 5, 0)
datetime.datetime(2014, 6, 20, 6, 0) datetime.datetime(2014, 6, 20, 7, 0)
datetime.datetime(2014, 6, 20, 8, 0) datetime.datetime(2014, 6, 20, 9, 0)
datetime.datetime(2014, 7, 19, 20, 0)
datetime.datetime(2014, 7, 19, 21, 0)
datetime.datetime(2014, 7, 19, 22, 0)
datetime.datetime(2014, 7, 19, 23, 0)]
```

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyncml-0.0.6.tar.gz (9.0 kB view details)

Uploaded Source

File details

Details for the file pyncml-0.0.6.tar.gz.

File metadata

  • Download URL: pyncml-0.0.6.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for pyncml-0.0.6.tar.gz
Algorithm Hash digest
SHA256 fd32fec5d1eac6a657a0930b06b192b65ad5afb1436ee2b27000e443d9b3b2da
MD5 34b639f2f068e38f3c01fe58261efbb7
BLAKE2b-256 86bad244c31e9f3d8021c6b3acb141d5d1cef6cd66e1a8ffab32cabf555aa246

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page