A simple python library to apply NcML logic to NetCDF files
Project description
# pyncml
#### A simple python library to apply NcML logic to NetCDF files
## Installation
##### Stable
pip install pyncml
##### Development
pip install git+https://github.com/kwilcox/pyncml.git
## Supported
* Adding things
* Attributes: `<attribute name="some_new_attribute" type="string" value="some_standard_name" />`
* Renaming things
* Variables: `<variable name="new_var" orgName="old_var" />`
* Attributes: `<attribute name="new_attr" orgName="old_attr" />`
* Dimensions: `<dimension name="new_dim" orgName="old_dim" />`
* Removing things
* Variables: `<remove name="some_variable" type="variable" />`
* Attributes: `<remove name="some_variable" type="variable" />`
* Aggregating things
* Scans: `<scan location="some_directory/foo/bar/" suffix=".nc" subdirs="true" />`
## Not supported
* Adding variables (could be implemented in the future)
* Groups (could be implemented in the future)
* Setting actual data values on variables (could be implemented in the future)
* Creating a file from scratch (could be implemented in the future)
* Removing Dimensions (not implemented in the C library)
* Aggregation scans that utilize the `dateFormatMark` attribute (most likely will never be implemented)
## Usage
### Apply
The `apply` function takes in a path to the `input_file` NetCDF file, an `ncml` object (string, file path, or python etree object), and an optional `output_file`. **If an output_file is not specified, the `input_file` will be edited in place**. The object returned from the `apply` function is a netcdf4-python object, ready to be used.
Any `location` attributes in the NcML are **ignored** and the NcML is applied against the file specified as the `input_file`.
##### Editing a file in place
```python
netcdf = '/some/file/path/in.nc'
ncml = '/some/file/path/foo.ncml'
import pyncml
nc = pyncml.apply(input_file=netcdf, ncml=ncml)
```
##### Using an NcML file
```python
netcdf = '/some/file/path/in.nc'
out = '/some/file/path/out.nc'
ncml = '/some/file/path/foo.ncml'
import pyncml
nc = pyncml.apply(input_file=netcdf, ncml=ncml, output_file=out)
```
##### Using an NcML string
```python
netcdf = '/some/file/path/in.nc'
out = '/some/file/path/out.nc'
ncml = """<?xml version="1.0" encoding="UTF-8"?>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<attribute name="new_attribute" value="works" />
<attribute name="new_history" orgName="history" />
<attribute name="new_file_format" orgName="file_format" value="New Format" />
<remove name="source" type="attribute" />
</netcdf>
"""
import pyncml
nc = pyncml.apply(input_file=netcdf, ncml=ncml, output_file=out)
```
##### Using an `etree` object
```python
import pyncml
netcdf = '/some/file/path/in.nc'
out = '/some/file/path/out.nc'
ncml = pyncml.etree.fromstring("""<?xml version="1.0" encoding="UTF-8"?>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<attribute name="new_attribute" value="works" />
</netcdf>
""")
nc = pyncml.apply(input_file=netcdf, ncml=ncml, output_file=out)
```
### Scan
The `scan` function takes in a path to an `ncml` object (string, file path, or python etree object). The object returned from the `scan` function is a metadata object describing the scan aggregation **it is not a netcdf4-python object of the aggregation**. You can create a `netcdf4-python` object from the scan aggregation (example below).
##### Obtaining aggregation metadata
```python
ncml = '/some/file/path/foo.ncml'
import pyncml
agg = pyncml.scan(ncml=ncml)
print agg.starting
2014-06-20 00:00:00+00:00
print agg.ending
2014-07-19 23:00:00+00:00
print agg.timevar_name
u'time'
print agg.standard_names
[
u'time',
u'projection_y_coordinate',
u'projection_x_coordinate',
u'eastward_wind_velocity'
]
print agg.members # These are already sorted by the 'starting' date
[
{
'starting': datetime.datetime(2014, 6, 20, 0, 0, tzinfo=<UTC>),
'ending': datetime.datetime(2014, 6, 20, 0, 0, tzinfo=<UTC>),
'path': '/path/to/aggregation/defined/in/ncml/first_member.nc'
'standard_names': [u'time',
u'projection_y_coordinate',
u'projection_x_coordinate',
u'eastward_wind_velocity'],
},
{
'starting': datetime.datetime(2014, 6, 20, 1, 0, tzinfo=<UTC>),
'ending': datetime.datetime(2014, 6, 20, 1, 0, tzinfo=<UTC>),
'path': '/path/to/aggregation/defined/in/ncml/second_member.nc'
'standard_names': [u'time',
u'projection_y_coordinate',
u'projection_x_coordinate',
u'eastward_wind_velocity'],
},
...
]
```
##### Creating `netcdf4-python` Aggregation object
<sup>**Note: This will not work with aggregations whose members overlap in time!**</sup>
```python
ncml = '/some/file/path/foo.ncml'
import pyncml
agg = pyncml.scan(ncml=ncml)
files = [ f.path for f in agg.members ]
agg = netCDF4.MFDataset(files)
time = agg.variables.get(agg.timevar_name)
print time
<class 'netCDF4._Variable'>
float64 time('time',)
long_name: date time
units: hours since 1970-01-01 00:00:00
_CoordinateAxisType: Time
unlimited dimensions = ('time',)
current size = (14,)
print time[:]
[ 389784. 389785. 389786. 389787. 389788. 389789. 389790. 389791.
389792. 389793. 390500. 390501. 390502. 390503.]
print netCDF4.num2date(time[:], units=time.units)
[datetime.datetime(2014, 6, 20, 0, 0) datetime.datetime(2014, 6, 20, 1, 0)
datetime.datetime(2014, 6, 20, 2, 0) datetime.datetime(2014, 6, 20, 3, 0)
datetime.datetime(2014, 6, 20, 4, 0) datetime.datetime(2014, 6, 20, 5, 0)
datetime.datetime(2014, 6, 20, 6, 0) datetime.datetime(2014, 6, 20, 7, 0)
datetime.datetime(2014, 6, 20, 8, 0) datetime.datetime(2014, 6, 20, 9, 0)
datetime.datetime(2014, 7, 19, 20, 0)
datetime.datetime(2014, 7, 19, 21, 0)
datetime.datetime(2014, 7, 19, 22, 0)
datetime.datetime(2014, 7, 19, 23, 0)]
```
#### A simple python library to apply NcML logic to NetCDF files
## Installation
##### Stable
pip install pyncml
##### Development
pip install git+https://github.com/kwilcox/pyncml.git
## Supported
* Adding things
* Attributes: `<attribute name="some_new_attribute" type="string" value="some_standard_name" />`
* Renaming things
* Variables: `<variable name="new_var" orgName="old_var" />`
* Attributes: `<attribute name="new_attr" orgName="old_attr" />`
* Dimensions: `<dimension name="new_dim" orgName="old_dim" />`
* Removing things
* Variables: `<remove name="some_variable" type="variable" />`
* Attributes: `<remove name="some_variable" type="variable" />`
* Aggregating things
* Scans: `<scan location="some_directory/foo/bar/" suffix=".nc" subdirs="true" />`
## Not supported
* Adding variables (could be implemented in the future)
* Groups (could be implemented in the future)
* Setting actual data values on variables (could be implemented in the future)
* Creating a file from scratch (could be implemented in the future)
* Removing Dimensions (not implemented in the C library)
* Aggregation scans that utilize the `dateFormatMark` attribute (most likely will never be implemented)
## Usage
### Apply
The `apply` function takes in a path to the `input_file` NetCDF file, an `ncml` object (string, file path, or python etree object), and an optional `output_file`. **If an output_file is not specified, the `input_file` will be edited in place**. The object returned from the `apply` function is a netcdf4-python object, ready to be used.
Any `location` attributes in the NcML are **ignored** and the NcML is applied against the file specified as the `input_file`.
##### Editing a file in place
```python
netcdf = '/some/file/path/in.nc'
ncml = '/some/file/path/foo.ncml'
import pyncml
nc = pyncml.apply(input_file=netcdf, ncml=ncml)
```
##### Using an NcML file
```python
netcdf = '/some/file/path/in.nc'
out = '/some/file/path/out.nc'
ncml = '/some/file/path/foo.ncml'
import pyncml
nc = pyncml.apply(input_file=netcdf, ncml=ncml, output_file=out)
```
##### Using an NcML string
```python
netcdf = '/some/file/path/in.nc'
out = '/some/file/path/out.nc'
ncml = """<?xml version="1.0" encoding="UTF-8"?>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<attribute name="new_attribute" value="works" />
<attribute name="new_history" orgName="history" />
<attribute name="new_file_format" orgName="file_format" value="New Format" />
<remove name="source" type="attribute" />
</netcdf>
"""
import pyncml
nc = pyncml.apply(input_file=netcdf, ncml=ncml, output_file=out)
```
##### Using an `etree` object
```python
import pyncml
netcdf = '/some/file/path/in.nc'
out = '/some/file/path/out.nc'
ncml = pyncml.etree.fromstring("""<?xml version="1.0" encoding="UTF-8"?>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<attribute name="new_attribute" value="works" />
</netcdf>
""")
nc = pyncml.apply(input_file=netcdf, ncml=ncml, output_file=out)
```
### Scan
The `scan` function takes in a path to an `ncml` object (string, file path, or python etree object). The object returned from the `scan` function is a metadata object describing the scan aggregation **it is not a netcdf4-python object of the aggregation**. You can create a `netcdf4-python` object from the scan aggregation (example below).
##### Obtaining aggregation metadata
```python
ncml = '/some/file/path/foo.ncml'
import pyncml
agg = pyncml.scan(ncml=ncml)
print agg.starting
2014-06-20 00:00:00+00:00
print agg.ending
2014-07-19 23:00:00+00:00
print agg.timevar_name
u'time'
print agg.standard_names
[
u'time',
u'projection_y_coordinate',
u'projection_x_coordinate',
u'eastward_wind_velocity'
]
print agg.members # These are already sorted by the 'starting' date
[
{
'starting': datetime.datetime(2014, 6, 20, 0, 0, tzinfo=<UTC>),
'ending': datetime.datetime(2014, 6, 20, 0, 0, tzinfo=<UTC>),
'path': '/path/to/aggregation/defined/in/ncml/first_member.nc'
'standard_names': [u'time',
u'projection_y_coordinate',
u'projection_x_coordinate',
u'eastward_wind_velocity'],
},
{
'starting': datetime.datetime(2014, 6, 20, 1, 0, tzinfo=<UTC>),
'ending': datetime.datetime(2014, 6, 20, 1, 0, tzinfo=<UTC>),
'path': '/path/to/aggregation/defined/in/ncml/second_member.nc'
'standard_names': [u'time',
u'projection_y_coordinate',
u'projection_x_coordinate',
u'eastward_wind_velocity'],
},
...
]
```
##### Creating `netcdf4-python` Aggregation object
<sup>**Note: This will not work with aggregations whose members overlap in time!**</sup>
```python
ncml = '/some/file/path/foo.ncml'
import pyncml
agg = pyncml.scan(ncml=ncml)
files = [ f.path for f in agg.members ]
agg = netCDF4.MFDataset(files)
time = agg.variables.get(agg.timevar_name)
print time
<class 'netCDF4._Variable'>
float64 time('time',)
long_name: date time
units: hours since 1970-01-01 00:00:00
_CoordinateAxisType: Time
unlimited dimensions = ('time',)
current size = (14,)
print time[:]
[ 389784. 389785. 389786. 389787. 389788. 389789. 389790. 389791.
389792. 389793. 390500. 390501. 390502. 390503.]
print netCDF4.num2date(time[:], units=time.units)
[datetime.datetime(2014, 6, 20, 0, 0) datetime.datetime(2014, 6, 20, 1, 0)
datetime.datetime(2014, 6, 20, 2, 0) datetime.datetime(2014, 6, 20, 3, 0)
datetime.datetime(2014, 6, 20, 4, 0) datetime.datetime(2014, 6, 20, 5, 0)
datetime.datetime(2014, 6, 20, 6, 0) datetime.datetime(2014, 6, 20, 7, 0)
datetime.datetime(2014, 6, 20, 8, 0) datetime.datetime(2014, 6, 20, 9, 0)
datetime.datetime(2014, 7, 19, 20, 0)
datetime.datetime(2014, 7, 19, 21, 0)
datetime.datetime(2014, 7, 19, 22, 0)
datetime.datetime(2014, 7, 19, 23, 0)]
```
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyncml-0.0.4.tar.gz
(4.4 kB
view hashes)