Skip to main content

A lexer and codec to work with LaTeX code in Python.

Project description

The codec provides a convenient way of going between text written in LaTeX and unicode. Since it is not a LaTeX compiler, it is more appropriate for short chunks of text, such as a paragraph or the values of a BibTeX entry, and it is not appropriate for a full LaTeX document. In particular, its behavior on the LaTeX commands that do not simply select characters is intended to allow the unicode representation to be understandable by a human reader, but is not canonical and may require hand tuning to produce the desired effect.

The encoder does a best effort to replace unicode characters outside of the range used as LaTeX input (ascii by default) with a LaTeX command that selects the character. More technically, the unicode code point is replaced by a LaTeX command that selects a glyph that reasonably represents the code point. Unicode characters with special uses in LaTeX are replaced by their LaTeX equivalents. For example,

original text

encoded LaTeX

¥

\yen

ü

\"u

\N{NO-BREAK SPACE}

~

~

\textasciitilde

%

\%

#

\#

\textbf{x}

\textbf{x}

The decoder does a best effort to replace LaTeX commands that select characters with the unicode for the character they are selecting. For example,

original LaTeX

decoded unicode

\yen

¥

\"u

ü

~

\N{NO-BREAK SPACE}

\textasciitilde

~

\%

%

\#

#

\textbf{x}

\textbf {x}

#

#

In addition, comments are dropped (including the final newline that marks the end of a comment), paragraphs are canonicalized into double newlines, and other newlines are left as is. Spacing after LaTeX commands is also canonicalized.

For example,

hi % bye
there\par world
\textbf     {awesome}

is decoded as

hi there

world
\textbf {awesome}

When decoding, LaTeX commands not directly selecting characters (for example, macros and formatting commands) are passed through unchanged. The same happens for LaTeX commands that select characters but are not yet recognized by the codec. Either case can result in a hybrid unicode string in which some characters are understood as literally the character and others as parts of unexpanded commands. Consequently, at times, backslashes will be left intact for denoting the start of a potentially unrecognized control sequence.

Given the numerous and changing packages providing such LaTeX commands, the codec will never be complete, and new translations of unrecognized unicode or unrecognized LaTeX symbols are always welcome.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

latexcodec-2.0.1.tar.gz (30.1 kB view details)

Uploaded Source

Built Distribution

latexcodec-2.0.1-py2.py3-none-any.whl (18.0 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file latexcodec-2.0.1.tar.gz.

File metadata

  • Download URL: latexcodec-2.0.1.tar.gz
  • Upload date:
  • Size: 30.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.19.4 CPython/2.7.17

File hashes

Hashes for latexcodec-2.0.1.tar.gz
Algorithm Hash digest
SHA256 2aa2551c373261cefe2ad3a8953a6d6533e68238d180eb4bb91d7964adb3fe9a
MD5 49c379bbdd1d924941c155f3d4dd5a92
BLAKE2b-256 842ffd47712130b303ff179c819cc5c63aa39586fc8d430bc299c0f5f56ec42c

See more details on using hashes here.

File details

Details for the file latexcodec-2.0.1-py2.py3-none-any.whl.

File metadata

  • Download URL: latexcodec-2.0.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 18.0 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.19.4 CPython/2.7.17

File hashes

Hashes for latexcodec-2.0.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 c277a193638dc7683c4c30f6684e3db728a06efb0dc9cf346db8bd0aa6c5d271
MD5 e4a6ba2c37bfcd6ac21b8dadb0849f4f
BLAKE2b-256 0a769552dfc6b74c2d6c3f199e927d41998dc1e561b7cbe4af7e7247388e17e8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page