Skip to main content

A simple lexer based on regular expressions

Project description

Lexery

Continuous integration Coverage PyPI - version PyPI - Python Version

A simple lexer based on regular expressions.

Inspired by https://eli.thegreenplace.net/2013/06/25/regex-based-lexical-analysis-in-python-and-javascript

Usage

You define the lexing rules and lexery matches them iteratively as a look-up:

>>> import lexery
>>> import re
>>> text = 'crop \t   ( 20, 30, 40, 10 ) ;'
>>>
>>> lexer = lexery.Lexer(
...     rules=[
...         lexery.Rule(identifier='identifier',
...             pattern=re.compile(r'[a-zA-Z_][a-zA-Z_]*')),
...         lexery.Rule(identifier='lpar', pattern=re.compile(r'\(')),
...         lexery.Rule(identifier='number', pattern=re.compile(r'[1-9][0-9]*')),
...         lexery.Rule(identifier='rpar', pattern=re.compile(r'\)')),
...         lexery.Rule(identifier='comma', pattern=re.compile(r',')),
...         lexery.Rule(identifier='semi', pattern=re.compile(r';'))
...     ],
...     skip_whitespace=True)
>>> tokens = lexer.lex(text=text)
>>> assert tokens == [[
...     lexery.Token('identifier', 'crop', 0, 0),
...     lexery.Token('lpar', '(', 9, 0),
...     lexery.Token('number', '20', 11, 0),
...     lexery.Token('comma', ',', 13, 0),
...     lexery.Token('number', '30', 15, 0),
...     lexery.Token('comma', ',', 17, 0),
...     lexery.Token('number', '40', 19, 0),
...     lexery.Token('comma', ',', 21, 0),
...     lexery.Token('number', '10', 23, 0),
...     lexery.Token('rpar', ')', 26, 0),
...     lexery.Token('semi', ';', 28, 0)]]

Mind that if a part of the text can not be matched, a lexery.Error is raised:

>>> import lexery
>>> import re
>>> text = 'some-identifier ( 23 )'
>>>
>>> lexer = lexery.Lexer(
...     rules=[
...         lexery.Rule(identifier='identifier', pattern=re.compile(r'[a-zA-Z_][a-zA-Z_]*')),
...         lexery.Rule(identifier='number', pattern=re.compile(r'[1-9][0-9]*')),
...     ],
...     skip_whitespace=True)
>>> tokens = lexer.lex(text=text)
Traceback (most recent call last):
...
lexery.Error: Unmatched text at line 0 and position 4:
some-identifier ( 23 )
    ^

If you specify an unmatched_identifier, all the unmatched characters are accumulated in tokens with that identifier:

>>> import lexery
>>> import re
>>> text = 'some-identifier ( 23 )-'
>>>
>>> lexer = lexery.Lexer(
...     rules=[
...         lexery.Rule(identifier='identifier', pattern=re.compile(r'[a-zA-Z_][a-zA-Z_]*')),
...         lexery.Rule(identifier='number', pattern=re.compile(r'[1-9][0-9]*')),
...     ],
...     skip_whitespace=True,
...     unmatched_identifier='unmatched')
>>> tokens = lexer.lex(text=text)
>>> assert tokens == [[
...     lexery.Token('identifier', 'some', 0, 0),
...    lexery.Token('unmatched', '-', 4, 0),
...    lexery.Token('identifier', 'identifier', 5, 0),
...    lexery.Token('unmatched', '(', 16, 0),
...    lexery.Token('number', '23', 18, 0),
...    lexery.Token('unmatched', ')-', 21, 0)]]

Installation

  • Install lexery with pip:

pip3 install lexery

Development

  • Check out the repository.

  • In the repository root, create the virtual environment:

python3 -m venv venv3
  • Activate the virtual environment:

source venv3/bin/activate
  • Install the development dependencies:

pip3 install -e .[dev]

Pre-commit Checks

We provide a set of pre-commit checks that run unit tests, lint and check code for formatting.

Namely, we use:

  • yapf to check the formatting.

  • The style of the docstrings is checked with pydocstyle.

  • Static type analysis is performed with mypy.

  • Various linter checks are done with pylint.

Run the pre-commit checks locally from an activated virtual environment with development dependencies:

./precommit.py
  • The pre-commit script can also automatically format the code:

./precommit.py  --overwrite

Versioning

We follow Semantic Versioning. The version X.Y.Z indicates:

  • X is the major version (backward-incompatible),

  • Y is the minor version (backward-compatible), and

  • Z is the patch version (backward-compatible bug fix).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lexery-1.1.2.tar.gz (5.1 kB view details)

Uploaded Source

File details

Details for the file lexery-1.1.2.tar.gz.

File metadata

  • Download URL: lexery-1.1.2.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.10.4

File hashes

Hashes for lexery-1.1.2.tar.gz
Algorithm Hash digest
SHA256 43e903c911be7621bc42cdc958f7e98716b824f23e284c3147a69b09799085cc
MD5 d3ab42f61ae398fa8a5b9372ebdd0862
BLAKE2b-256 f31fba9b4c70d7d7cbfc636e8086f1ad9b4afdd019cdcd2b90694babc7590a76

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page