Optimizing compiler for evaluating mathematical expressions on CPUs and GPUs.
Project description
Theano is a Python library that allows you to define, optimize, and efficiently evaluate mathematical expressions involving multi-dimensional arrays. It is built on top of NumPy_. Theano features:
* **tight integration with NumPy:** a similar interface to NumPy's. numpy.ndarrays are also used internally in Theano-compiled functions.
* **transparent use of a GPU:** perform data-intensive computations up to 140x faster than on a CPU (support for float32 only).
* **efficient symbolic differentiation:** Theano can compute derivatives for functions of one or many inputs.
* **speed and stability optimizations:** avoid nasty bugs when computing expressions such as log(1+ exp(x) ) for large values of x.
* **dynamic C code generation:** evaluate expressions faster.
* **extensive unit-testing and self-verification:** includes tools for detecting and diagnosing bugs and/or potential problems.
Theano has been powering large-scale computationally intensive scientific
research since 2007, but it is also approachable enough to be used in the
classroom (IFT6266 at the University of Montreal).
.. _NumPy: http://numpy.scipy.org/
Modifications in the trunk since the last release
Theano 0.4.0rc4 (2011-06-13)
--------------------------------------------------
Deprecation:
* tag.shape attribute deprecated (#633)
* FAST_RUN_NOGC mode deprecated
* CudaNdarray_new_null is deprecated in favour of CudaNdarray_New
* Dividing integers with / is deprecated: use // for integer division, or
cast one of the integers to a float type if you want a float result (you may
also change this behavior with config.int_division).
Bugs fixed:
* Bugfix in CudaNdarray.__iadd__. When it is not implemented, return the error.
* Typo fixed in tensor/opt.py
* THEANO_FLAGS='optimizer=None' now works as expected
* Fixed memory leak in error handling on GPU-to-host copy
* Fix relating specifically to Python 2.7 on Mac OS X
* infer_shape can now handle Python longs
* Fixed behaviour of pydotprint's max_label_size option
* Trying to compute x % y with one or more arguments being complex now
raises an error.
* The output of random samples computed with uniform(..., dtype=...) is
guaranteed to be of the specified dtype instead of potentially being of a
higher-precision dtype.
* Python 2.4 syntax fixes.
Crash fixed:
* Work around a bug in gcc 4.3.0 that make the compilation of 2d convolution
crash.
Optimization:
* Optimize 4 pattern of subtensor followed by subtensor.
* Gemm inplace optimization on the GPU re-enabled
GPU:
* Move to the gpu fused elemwise that have other dtype then float32 in them
(except float64) if the input and output are float32.
* This allow to move elemwise comparisons to the GPU if we cast it to
float32 after that.
* Implemented CudaNdarray.ndim to have the same interface in ndarray.
* Fixed slowdown caused by multiple chained views on CudaNdarray objects
* CudaNdarray_alloc_contiguous changed so as to never try to free
memory on a view: new "base" property
* Safer decref behaviour in CudaNdarray in case of failed allocations
* New GPU implementation of tensor.basic.outer
* Multinomial random variates now available on GPU
New features:
* ProfileMode
* profile the scan overhead
* simple hook system to add profiler
* reordered the output to be in the order of more general to more specific
* var[vector of index] now work, (grad work recursively, the direct grad
work inplace, gpu work)
* limitation: work only of the outer most dimensions.
* test_value implementation to allow quick debugging at graph creation time
* cuda.root inferred if nvcc is on the path, otherwise defaults to
/usr/local/cuda
* Better graph printing for graphs involving a scan subgraph
* Casting behavior is closer to numpy by default, and can be controlled
through config.cast_policy.
* Smarter C module cache, avoiding erroneous usage of the wrong C
implementation when some options change, and avoiding recompiling the
same module multiple times in some situations.
* The "theano-cache clear" command now clears the cache more thoroughly.
* More extensive linear algebra ops (CPU only) that wrap scipy.linalg
now available in the sandbox.
* CUDA devices 4 - 16 should now be available if present.
* infer_shape support for the View op, better infer_shape support in Scan
Documentation:
* Better commenting of cuda_ndarray.cu
* Fixes in the scan documentation: add missing declarations/print statements
* Better error message on failed __getitem__
* Updated documentation on profile mode
* Better documentation of testing on Windows
* Better documentation of the 'run_individual_tests' script
Unit tests:
* More strict float comparaison by default
* Reuse test for subtensor of tensor for gpu tensor(more gpu test)
* Tests that check for aliased function inputs and assure appropriate copying
(#374)
* Better test of copies in CudaNdarray
* New tests relating to the new base pointer requirements
* Better scripts to run tests individually or in batches
* Some tests are now run whenever cuda is available and not just when it has
been enabled before
* Tests display less pointless warnings.
Other:
* Correctly put the broadcast flag to True in the output var of
a Reshape op when we receive an int 1 in the new shape.
* pydotprint: high contrast mode is now the default
* More compact printing (ignore leading "Composite" in op names)
* **tight integration with NumPy:** a similar interface to NumPy's. numpy.ndarrays are also used internally in Theano-compiled functions.
* **transparent use of a GPU:** perform data-intensive computations up to 140x faster than on a CPU (support for float32 only).
* **efficient symbolic differentiation:** Theano can compute derivatives for functions of one or many inputs.
* **speed and stability optimizations:** avoid nasty bugs when computing expressions such as log(1+ exp(x) ) for large values of x.
* **dynamic C code generation:** evaluate expressions faster.
* **extensive unit-testing and self-verification:** includes tools for detecting and diagnosing bugs and/or potential problems.
Theano has been powering large-scale computationally intensive scientific
research since 2007, but it is also approachable enough to be used in the
classroom (IFT6266 at the University of Montreal).
.. _NumPy: http://numpy.scipy.org/
Modifications in the trunk since the last release
Theano 0.4.0rc4 (2011-06-13)
--------------------------------------------------
Deprecation:
* tag.shape attribute deprecated (#633)
* FAST_RUN_NOGC mode deprecated
* CudaNdarray_new_null is deprecated in favour of CudaNdarray_New
* Dividing integers with / is deprecated: use // for integer division, or
cast one of the integers to a float type if you want a float result (you may
also change this behavior with config.int_division).
Bugs fixed:
* Bugfix in CudaNdarray.__iadd__. When it is not implemented, return the error.
* Typo fixed in tensor/opt.py
* THEANO_FLAGS='optimizer=None' now works as expected
* Fixed memory leak in error handling on GPU-to-host copy
* Fix relating specifically to Python 2.7 on Mac OS X
* infer_shape can now handle Python longs
* Fixed behaviour of pydotprint's max_label_size option
* Trying to compute x % y with one or more arguments being complex now
raises an error.
* The output of random samples computed with uniform(..., dtype=...) is
guaranteed to be of the specified dtype instead of potentially being of a
higher-precision dtype.
* Python 2.4 syntax fixes.
Crash fixed:
* Work around a bug in gcc 4.3.0 that make the compilation of 2d convolution
crash.
Optimization:
* Optimize 4 pattern of subtensor followed by subtensor.
* Gemm inplace optimization on the GPU re-enabled
GPU:
* Move to the gpu fused elemwise that have other dtype then float32 in them
(except float64) if the input and output are float32.
* This allow to move elemwise comparisons to the GPU if we cast it to
float32 after that.
* Implemented CudaNdarray.ndim to have the same interface in ndarray.
* Fixed slowdown caused by multiple chained views on CudaNdarray objects
* CudaNdarray_alloc_contiguous changed so as to never try to free
memory on a view: new "base" property
* Safer decref behaviour in CudaNdarray in case of failed allocations
* New GPU implementation of tensor.basic.outer
* Multinomial random variates now available on GPU
New features:
* ProfileMode
* profile the scan overhead
* simple hook system to add profiler
* reordered the output to be in the order of more general to more specific
* var[vector of index] now work, (grad work recursively, the direct grad
work inplace, gpu work)
* limitation: work only of the outer most dimensions.
* test_value implementation to allow quick debugging at graph creation time
* cuda.root inferred if nvcc is on the path, otherwise defaults to
/usr/local/cuda
* Better graph printing for graphs involving a scan subgraph
* Casting behavior is closer to numpy by default, and can be controlled
through config.cast_policy.
* Smarter C module cache, avoiding erroneous usage of the wrong C
implementation when some options change, and avoiding recompiling the
same module multiple times in some situations.
* The "theano-cache clear" command now clears the cache more thoroughly.
* More extensive linear algebra ops (CPU only) that wrap scipy.linalg
now available in the sandbox.
* CUDA devices 4 - 16 should now be available if present.
* infer_shape support for the View op, better infer_shape support in Scan
Documentation:
* Better commenting of cuda_ndarray.cu
* Fixes in the scan documentation: add missing declarations/print statements
* Better error message on failed __getitem__
* Updated documentation on profile mode
* Better documentation of testing on Windows
* Better documentation of the 'run_individual_tests' script
Unit tests:
* More strict float comparaison by default
* Reuse test for subtensor of tensor for gpu tensor(more gpu test)
* Tests that check for aliased function inputs and assure appropriate copying
(#374)
* Better test of copies in CudaNdarray
* New tests relating to the new base pointer requirements
* Better scripts to run tests individually or in batches
* Some tests are now run whenever cuda is available and not just when it has
been enabled before
* Tests display less pointless warnings.
Other:
* Correctly put the broadcast flag to True in the output var of
a Reshape op when we receive an int 1 in the new shape.
* pydotprint: high contrast mode is now the default
* More compact printing (ignore leading "Composite" in op names)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Theano-0.4.0rc4.zip
(1.1 MB
view hashes)
Theano-0.4.0rc4.tar.gz
(977.9 kB
view hashes)