Skip to main content

Universal errors-and-erasures Reed Solomon codec (error correcting code) in pure Python with extensive documentation

Project description

https://travis-ci.org/lrq3000/unireedsolomon.svg?branch=master https://coveralls.io/repos/lrq3000/unireedsolomon/badge.svg?branch=master&service=github https://badge.fury.io/py/unireedsolomon.svg https://img.shields.io/pypi/dm/unireedsolomon.svg

Written from scratch by Andrew Brown <brownan@gmail.com> <brownan@cs.duke.edu> (c) 2010. Upgraded by Stephen Larroque <LRQ3000@gmail.com> in 2015.

Licensed under the MIT License.

This library implements a pure-Python documented universal Reed-Solomon error correction codec with a mathematical nomenclatura, compatible with Python 2.6 up to 3.4 and also with PyPy 2 and PyPy 3.

The project aims to keep a well commented and organized code with an extensive documentation and mathematical clarity of the various arithmetic operations.

How does this library differs from other Reed-Solomon libraries?

  • universal: compatibility with (almost) any other Reed-Solomon codec. This means that you can choose the parameters so that you can either encode data and decode it with another RS codec, or on the opposite encode data with another RS codec and decode this data with this library.

  • errors-and-erasures decoding allows to decode both erasures (where you know the position of the corrupted characters) and errors (where you don’t know where are the corrupted characters) at the same time (because you can decode more erasures than errors, so if you can provide even a few know corrupted characters positions, this can help a lot the decoder to repair the message).

  • documented: following literate programming guidelines, you should understand everything you need about RS by reading the code and the comments.

  • mathematical nomenclatura: for example, contrary to most other RS libraries, you will see a clear distinction between the different mathematical constructs, such as the Galois Fields numbers are clearly separated from the generic Polynomial objects, and both are separated from the Reed-Solomon algorithm, which makes use of both of those constructs. For this purpose, object-oriented programming was chosen to design the architecture of the library, although obviously at the expense of a bit of performance. However, this library favors mathematical clarity and documentation over performance (even if performance is optimized whenever possible).

  • pure-Python means that there are no dependencies whatsoever apart from the Python interpreter. This means that this library should be resilient in the future (since it doesn’t depend on external libraries who can become broken with time, see software rot), and you can use it on any system where Python can be installed (including online cloud services).

The authors tried their best to extensively document the algorithms. However, a lot of the math involved is non-trivial and we can’t explain it all in the comments. To learn more about the algorithms, see these resources:

Also, here’s a copy of the presentation one of the authors gave to the class Spring 2010 on his experience implementing this library. The LaTeX source is in the presentation directory.

http://www.cs.duke.edu/courses/spring10/cps296.3/decoding_rs.pdf

The code was lately updated to support errors-and-erasures decoding (both at the same time), and to be universal (you can supply the parameters to be compatible with almost any other RS codec).

The codec has decent performances if you use PyPy with the fast methods (~1 MB/s), but it would be faster if we drop the object-oriented design (implementing everything in functions), but this would be at the expense of mathematical clarity. If you are interested, see the reedsolo library by Tomer Filiba, which is exactly the same implementation but only functional without objects (results in about 5x speedup).

Files

rs.py

Holds the Reed-Solomon Encoder/Decoder object

polynomial.py

Contains the Polynomial object (pure-python)

ff.py

Contains the GF2int object representing an element of the GF(2^p) field, with p being 8 by default (pure-python)

polynomial.pyx

Cython implementation of polynomial.py with equivalent functions (optional)

ff.pyx

Cython implementation of ff.py with equivalent functions (optional)

Documentation

unireedsolomon.rs.RSCoder(n, k, generator=3, prim=0x11b, fcr=1, c_exp=8)

Creates a new Reed-Solomon Encoder/Decoder object configured with the given n and k values. n is the length of a codeword, must be less than 256 k is the length of the message, must be less than n generator, prim and fcr parametrize the Galois Field that will be built c_exp is the Galois Field range (ie, 8 means GF(2^8) = GF(256)), which is both the limit for one symbol’s value, and the maximum length of a message+ecc.

The code will have error correcting power (ie, maximum number of repairable symbols) of 2*e+v <= (n-k), where e is the number of errors and v the number of erasures.

The typical RSCoder is RSCoder(255, 223)

RSCoder.encode(message, poly=False, k=None)

Encode a given string with reed-solomon encoding. Returns a byte string with the k message bytes and n-k parity bytes at the end.

If a message is < k bytes long, it is assumed to be padded at the front with null bytes (ie, a shortened Reed-Solomon code).

The sequence returned is always n bytes long.

If poly is not False, returns the encoded Polynomial object instead of the polynomial translated back to a string (useful for debugging)

You can change the length (number) of parity/ecc bytes at encoding by setting k to any value between [1, n-1]. This allows to create only one RSCoder and then use it with a variable redundancy rate.

RSCoder.encode_fast(message, poly=False, k=None)

Same as encode() but using faster algorithms and optimization tricks.

RSCoder.decode(message_ecc, nostrip=False, k=None, erasures_pos=None, only_erasures=False):

Given a received string or byte array message_ecc (composed of a message string + ecc symbols at the end), attempts to decode it. If it’s a valid codeword, or if there are no more than 2*e+v <= (n-k) erratas (called the Singleton bound), the message is returned.

You can provide the erasures positions as a list to erasures_pos. For example, if you have “hella warld” and you know that a is an erasure, you can provide the list erasures_pos=[4, 7]. You can correct twice as many erasures than errors, and if some provided erasures are wrong (they are correct symbols), then there’s no problem, they will be repaired just fine (but will count towards the Singleton bound). You can also specify that you are sure there are only erasures and no errors at all by setting only_erasures=True.

A message always has k bytes, if a message contained less it is left padded with null bytes (punctured RS code). When decoded, these leading null bytes are stripped, but that can cause problems if decoding binary data. When nostrip is True, messages returned are always k bytes long. This is useful to make sure no data is lost when decoding binary data.

Note that RS can correct errors both in the message and the ecc symbols.

RSCoder.decode_fast(message_ecc, nostrip=False, k=None, erasures_pos=None, only_erasures=False):

Same as decode() but using faster algorithms and optimization tricks.

RSCoder.check(message_ecc, k=None)

Verifies the codeword (message + ecc symbols at the end) is valid by testing that the code as a polynomial code divides g, or that the syndrome is all 0 coefficients. The result is not foolproof: if it’s False, you’re sure the message was corrupted (or that you used the wrong RS parameters), but if it’s True, it’s either that the message is correct, or that there are too many errors (ie, more than the Singleton bound) for RS to do anything about it. returns True/False

RSCoder.check_fast(message_ecc, k=None)

Same as check() but using faster algorithms and optimization tricks.

unireedsolomon.ff.find_prime_polynomials(generator=2, c_exp=8, fast_primes=False, single=False)

Compute the list of prime polynomials for the given generator and galois field characteristic exponent. You can then use this prime polynomial to specify the mandatory “prim” parameter, particularly if you are using a larger Galois Field (eg, 2^16).

Internal API

Besides the main RSCoder object, two other objects are used in this implementation: Polynomial and GF2int. Their use is not specifically tied to the coder or even to the Reed-Solomon algorithm, they are just generic mathematical constructs respectively representing polynomials and Galois field’s number of base 2.

You do not need to know about the internal API to use the RS codec, this is just left as a documentation for the reader interested into dwelling inside the mathematical constructs.

polynomial.Polynomial(coefficients=[], **sparse)

There are three ways to initialize a Polynomial object. 1) With a list, tuple, or other iterable, creates a polynomial using the items as coefficients in order of decreasing power

2) With keyword arguments such as for example x3=5, sets the coefficient of x^3 to be 5

3) With no arguments, creates an empty polynomial, equivalent to Polynomial([0])

>>> print Polynomial([5, 0, 0, 0, 0, 0])
5x^5
>>> print Polynomial(x32=5, x64=8)
8x^64 + 5x^32
>>> print Polynomial(x5=5, x9=4, x0=2)
4x^9 + 5x^5 + 2

Polynomial objects export the following standard functions that perform the expected operations using polynomial arithmetic. Arithmetic of the coefficients is determined by the type passed in, so integers or GF2int objects could be used, the Polynomial class is agnostic to the type of the coefficients.

__add__
__divmod__
__eq__
__floordiv__
__hash__
__len__
__mod__
__mul__
__ne__
__neg__
__sub__
evaluate(x)
degree()
    Returns the degree of the polynomial
get_coefficient(degree)
    Returns the coefficient of the specified term
ff.GF2int(value)

Instances of this object are elements of the field GF(2^p) and instances are integers in the range 0 to (2^p)-1. By default, the field is GF(2^8) and instances are integers in the range 0 to 255 and is defined using the irreducable polynomial 0x11b or in binary form: x^8 + x^4 + x^3 + x + 1 and using 3 as the generator for the exponent table and log table.

You can however use other parameters for the Galois Field, using the init_lut() function.

ff.find_prime_polynomials(generator=2, c_exp=8, fast_primes=False, single=False)

Find the list of prime polynomials to use to generate the look-up tables for your field.

ff.init_lut(generator=3, prim=0x11b, c_exp=8)

Generate the look-up tables given the parameters. This effectively parametrize your Galois Field (ie, generator=2, prim=0x1002d, c_exp=16) will generate a GF(2^16) field.

The GF2int class inherits from int and supports all the usual integer operations. The following methods are overridden for arithmetic in the finite field GF(2^p)

__add__
__div__
__mul__
__neg__
__pow__
__radd__
__rdiv__
__rmul__
__rsub__
__sub__
inverse()
    Multiplicative inverse in GF(2^p)

Examples

>>> import rs
>>> coder = rs.RSCoder(20,13)
>>> c = coder.encode("Hello, world!")
>>> print repr(c)
'Hello, world!\x8d\x13\xf4\xf9C\x10\xe5'
>>>
>>> r = "\0"*3 + c[3:]
>>> print repr(r)
'\x00\x00\x00lo, world!\x8d\x13\xf4\xf9C\x10\xe5'
>>>
>>> coder.decode(r)
'Hello, world!'

Image Encoder

imageencode.py is an example script that encodes codewords as rows in an image. It requires PIL to run.

Usage: python imageencode.py [-d] <image file>

Without the -d flag, imageencode.py will encode text from standard in and output it to the image file. With -d, imageencode.py will read in the data from the image and output to standard out the decoded text.

An example is included: exampleimage.png. Try decoding it as-is, then open it up in an image editor and paint some vertical stripes on it. As long as no more than 16 pixels per row are disturbed, the text will be decoded correctly. Then draw more stripes such that more than 16 pixels per row are disturbed and verify that the message is decoded improperly.

Notice how the parity data looks different–the last 32 pixels of each row are colored differently. That’s because this particular image contains encoded ASCII text, which generally only has bytes from a small range (the alphabet and printable punctuation). The parity data, however, is binary and contains bytes from the full range 0-255. Also note that either the data area or the parity area (or both!) can be disturbed as long as no more than 16 bytes per row are disturbed.

Cython implementation

If either a C compiler or Cython is found, rs.py will automatically load the Cython implementations (the *.pyx files). These are provided as optimized versions of the pure-python implementations, with equivalent functionalities. The goal was to get a speedup, which is the case, but using PyPy on the pure-python implementation provides a significantly higher speedup than the Cython implementation. The Cython implementations are still provided for the interested reader, but the casual user is not advised to use them. If you want to encode and decode fast, use PyPy.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

unireedsolomon-1.0.zip (276.0 kB view details)

Uploaded Source

unireedsolomon-1.0.tar.gz (253.2 kB view details)

Uploaded Source

Built Distributions

unireedsolomon-1.0.win32-py2.7.exe (370.7 kB view details)

Uploaded Source

unireedsolomon-1.0-py2.7-win32.egg (207.9 kB view details)

Uploaded Source

unireedsolomon-1.0-cp27-none-win32.whl (166.2 kB view details)

Uploaded CPython 2.7 Windows x86

File details

Details for the file unireedsolomon-1.0.zip.

File metadata

  • Download URL: unireedsolomon-1.0.zip
  • Upload date:
  • Size: 276.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for unireedsolomon-1.0.zip
Algorithm Hash digest
SHA256 5e9a4fbfb790b8a13ebfb2d12e9a43a566dd19babcbef671a778835bba53adfa
MD5 87256e56c6184c03d0758ccecbe360cd
BLAKE2b-256 bbdc81761378e4a07686a8bb97676958b1371f9df67cbd0c8b4f81af92be9efa

See more details on using hashes here.

File details

Details for the file unireedsolomon-1.0.tar.gz.

File metadata

File hashes

Hashes for unireedsolomon-1.0.tar.gz
Algorithm Hash digest
SHA256 e03421bc31ecc0e0f8e10c7e7cad7f14a870d337035ebf678e163baf5dd06169
MD5 31d9e66a76ebf42d4639913e9dc0d0a5
BLAKE2b-256 b68f63e077d8064eb4993593fad0d815362657c1da7c83f2283cf5d413063fb8

See more details on using hashes here.

File details

Details for the file unireedsolomon-1.0.win32-py2.7.exe.

File metadata

File hashes

Hashes for unireedsolomon-1.0.win32-py2.7.exe
Algorithm Hash digest
SHA256 e0a7e4f19f76d46c5d1f72d141f4d70e827eb5d950048945ec1b594a60ee20a1
MD5 2567839e71373761b7b77a5992bd9898
BLAKE2b-256 7fcaa2f46b83a2af2e921215bb2c28c55a292abbf4a9c0009cd7b9da256b2595

See more details on using hashes here.

File details

Details for the file unireedsolomon-1.0-py2.7-win32.egg.

File metadata

File hashes

Hashes for unireedsolomon-1.0-py2.7-win32.egg
Algorithm Hash digest
SHA256 9125554de619fe979768b1fb3b6816d4febf7fb5a5e35e7b533315398366385b
MD5 92aa86a21baa32eccfd56e0c8c3469f4
BLAKE2b-256 b3039280368cfc15dad459bfbd9d24534b5b6d4b09d5aca182440e37b14b2a42

See more details on using hashes here.

File details

Details for the file unireedsolomon-1.0-cp27-none-win32.whl.

File metadata

File hashes

Hashes for unireedsolomon-1.0-cp27-none-win32.whl
Algorithm Hash digest
SHA256 d82bd80f436110dee1ce3c723db322d739d0a731885dd6487c72b9e0eb703fa6
MD5 886e925b72b31a5f866c1429068556f8
BLAKE2b-256 6fae8ada40977a041279c64114db2c16902c79fdad4402e3f1917e4eb7079dc7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page