NumPy arrays with named axes and named indices.
Project description
Scientists, engineers, mathematicians and statisticians don’t just work with matrices; they often work with structured data, just like you’d find in a table. However, functionality for this is missing from Numpy, and there are efforts to create something to fill the void. This is one of those efforts.
Datarray provides a subclass of Numpy ndarrays that support:
individual dimensions (axes) being labeled with meaningful descriptions
labeled ‘ticks’ along each axis
indexing and slicing by named axis
indexing on any axis with the tick labels instead of only integers
reduction operations (like .sum, .mean, etc) support named axis arguments instead of only integer indices.
Prior Art
At present, there is no accepted standard solution to dealing with tabular data such as this. However, based on the following list of ad-hoc and proposal-level implementations of something such as this, there is definitely a demand for it. For examples, in no particular order:
[Tabular](http://bitbucket.org/elaine/tabular/src) implements a spreadsheet-inspired datatype, with rows/columns, csv/etc. IO, and fancy tabular operations.
[scikits.statsmodels](http://scikits.appspot.com/statsmodels) sounded as though it had some features we’d like to eventually see implemented on top of something such as datarray, and [Skipper](http://scipystats.blogspot.com/) seemed pretty interested in something like this himself.
[scikits.timeseries](http://scikits.appspot.com/timeseries) also has a time-series-specific object that’s somewhat reminiscent of labeled arrays.
[pandas](http://pandas.sourceforge.net/) is based around a number of DataFrame-esque datatypes.
[pydataframe](http://code.google.com/p/pydataframe/) is supposed to be a clone of R’s data.frame.
[larry](http://github.com/kwgoodman/la), or “labeled array,” often comes up in discussions alongside pandas.
[divisi](http://github.com/commonsense/divisi2) includes labeled sparse and dense arrays.
Project Goals
Get something akin to this in the numpy core.
2. Stick to basic functionality such that projects like scikits.statsmodels and pandas can use it as a base datatype.
3. Make an interface that allows for simple, pretty manipulation that doesn’t introduce confusion.
Oh, and make sure that the base numpy array is still accessible.
Code
You can find our sources and single-click downloads:
Main repository on Github.
Documentation for all releases and current development tree.
Download as a tar/zip file the current trunk.
Downloads of all available releases.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
File details
Details for the file datarray-0.0.5.zip
.
File metadata
- Download URL: datarray-0.0.5.zip
- Upload date:
- Size: 73.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bd25137f1e92714241d3c799b760ee5449934509d3b8dadb5e53af10b11d3dca |
|
MD5 | 3216104962ebac2ea78f8c23b6150dff |
|
BLAKE2b-256 | 80b6e6dc7eb7409787dade36b3ceead984a9e5e69bfdc0f230cf40f706dfcd5f |
File details
Details for the file datarray-0.0.5.tar.gz
.
File metadata
- Download URL: datarray-0.0.5.tar.gz
- Upload date:
- Size: 61.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6a307fc89dc488e17e237922282a7fc904d17a6c8a81dc553272a0815eb01c68 |
|
MD5 | 9906ba648f3565e7cfbce5a5d9166cd4 |
|
BLAKE2b-256 | 19c42f4106df4a55b8a57227f1ceb4ecb13b6da654e3bd407209f38e7f5036ed |