Grammars, parsers, and compilers in Python 3.6+
Project description
# shreducer
Simple compilers in Python for fun and profit.
## Grammars Included in This Project
* Python 3.6+ type hints string representation
* Filter expressions for web APIs. Allows parsing of expressions like this:
`https://your.api?filter=(status eq open and type eq store) or (status eq closed and type not in office, garage)`
* With a compiler of Elasticsearch queries included (not tested since 2016)
* A couple of primitive grammars for basic arithmetic expressions as examples
## Components
* *Tokenizers* (no need to implement your own, we rely on `shlex` from Python Standard Library)
* Grammars
* *Parsers* (no need to implement your own, we use a *shift-reduce* parser, hence the name of the project)
* Generators
**Tokenizer** splits input string into lexical units of the grammar (we call them tokens here).
**Grammar** describes the syntax rules of a language that we want to parse.
**Parser** parses sequence of tokens and generates a parse tree.
For example `('+', 2, ('-', 10, 3))` could be a parse tree.
**Generator** takes a parse tree and evaluates it. For example, an arithmetic generator could take
a parse tree `('+', 2, ('-', 10, 3))` as an input and produce `9` as an output.
All these components together make a compiler.
## Implementing a New Grammar
See examples in `shreducer/examples/`:
* `DictG` - simplest of all grammars, the most suitable to understand the basic idea, parser produces parsed dictionary
* `ListG` - another simple grammar, but unlike dictionary grammar, parser for this one produces parse tree
* `PlusMinusArithmeticsG` - simple arithmetic expression parser, parser produces parse tree
* `BetterArithmeticsG` - arithmetic expression parser that respects operator precedence, parser produces parse tree
* `FilterExpressionsG` - comparison operators and logical operators, parser produces parse tree
* `BetterFiltersG` - comparatively rich filter expression language, unlike other grammars this one uses look-ahead,
parser produces parse tree
There is some magic (a meta class) going on in `t` class to allow
declaring a string constant without writing its value twice:
class MyGrammar(Grammar):
class t:
IDENT = None
PLUS_MINUS = '+-'
EXPR = ()
After the class `MyGrammar` is created, the value of `MyGrammar.t.IDENT` will be `"IDENT"`.
Similarly, `MyGrammar.t.PLUS_MINUS` will be `"PLUS_MINUS"`, and `MyGrammar.t.EXPR` will be `"EXPR"`.
Member of `class t` with value `None` is treated as the default token type.
Members of `class t` with value `()` are treated as names of tokens of higher order -- expressions.
## Testing Your New Grammar
If you implement just a grammar, you can try parsing input strings with `Grammar.simple_parse` (which is a class
method).
For example, to try `TypeHintsG` (grammar for parsing type hints string representation in Python 3.6+),
you can do:
print(TypeHintsG.simple_parse('typing.Union[typing.List[str], typing.Dict[str, int]]'))
This will produce the following parse tree:
{
"name": "typing.Union",
"args": [
{
"name": "typing.List",
"args": [
{"name": "str", "args": None},
],
},
{
"name": "typing.Dict",
"args": [
{"name": "str", "args": None},
{"name": "int", "args": None},
],
},
],
}
Simple compilers in Python for fun and profit.
## Grammars Included in This Project
* Python 3.6+ type hints string representation
* Filter expressions for web APIs. Allows parsing of expressions like this:
`https://your.api?filter=(status eq open and type eq store) or (status eq closed and type not in office, garage)`
* With a compiler of Elasticsearch queries included (not tested since 2016)
* A couple of primitive grammars for basic arithmetic expressions as examples
## Components
* *Tokenizers* (no need to implement your own, we rely on `shlex` from Python Standard Library)
* Grammars
* *Parsers* (no need to implement your own, we use a *shift-reduce* parser, hence the name of the project)
* Generators
**Tokenizer** splits input string into lexical units of the grammar (we call them tokens here).
**Grammar** describes the syntax rules of a language that we want to parse.
**Parser** parses sequence of tokens and generates a parse tree.
For example `('+', 2, ('-', 10, 3))` could be a parse tree.
**Generator** takes a parse tree and evaluates it. For example, an arithmetic generator could take
a parse tree `('+', 2, ('-', 10, 3))` as an input and produce `9` as an output.
All these components together make a compiler.
## Implementing a New Grammar
See examples in `shreducer/examples/`:
* `DictG` - simplest of all grammars, the most suitable to understand the basic idea, parser produces parsed dictionary
* `ListG` - another simple grammar, but unlike dictionary grammar, parser for this one produces parse tree
* `PlusMinusArithmeticsG` - simple arithmetic expression parser, parser produces parse tree
* `BetterArithmeticsG` - arithmetic expression parser that respects operator precedence, parser produces parse tree
* `FilterExpressionsG` - comparison operators and logical operators, parser produces parse tree
* `BetterFiltersG` - comparatively rich filter expression language, unlike other grammars this one uses look-ahead,
parser produces parse tree
There is some magic (a meta class) going on in `t` class to allow
declaring a string constant without writing its value twice:
class MyGrammar(Grammar):
class t:
IDENT = None
PLUS_MINUS = '+-'
EXPR = ()
After the class `MyGrammar` is created, the value of `MyGrammar.t.IDENT` will be `"IDENT"`.
Similarly, `MyGrammar.t.PLUS_MINUS` will be `"PLUS_MINUS"`, and `MyGrammar.t.EXPR` will be `"EXPR"`.
Member of `class t` with value `None` is treated as the default token type.
Members of `class t` with value `()` are treated as names of tokens of higher order -- expressions.
## Testing Your New Grammar
If you implement just a grammar, you can try parsing input strings with `Grammar.simple_parse` (which is a class
method).
For example, to try `TypeHintsG` (grammar for parsing type hints string representation in Python 3.6+),
you can do:
print(TypeHintsG.simple_parse('typing.Union[typing.List[str], typing.Dict[str, int]]'))
This will produce the following parse tree:
{
"name": "typing.Union",
"args": [
{
"name": "typing.List",
"args": [
{"name": "str", "args": None},
],
},
{
"name": "typing.Dict",
"args": [
{"name": "str", "args": None},
{"name": "int", "args": None},
],
},
],
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
shreducer-1.0.0.tar.gz
(18.2 kB
view details)
Built Distribution
shreducer-1.0.0-py3-none-any.whl
(39.5 kB
view details)
File details
Details for the file shreducer-1.0.0.tar.gz
.
File metadata
- Download URL: shreducer-1.0.0.tar.gz
- Upload date:
- Size: 18.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.4.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3facbaab35e17ab364a86a51b672de22ca11fa702b13d7c06bde4aa43da7b8e9 |
|
MD5 | 7303a523bf7c6792b16775252bf0c67a |
|
BLAKE2b-256 | 09b8f356717de2cbc2d03e566a97cf79111087498422e0647fd40ba20b8d9a62 |
File details
Details for the file shreducer-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: shreducer-1.0.0-py3-none-any.whl
- Upload date:
- Size: 39.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.4.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d696ebcd37a738c66a25bf96e0d1d5b91c046c1836eae667e226a7ed73fa4eb9 |
|
MD5 | 1b4d71ae3d3bbbbec8b43b0f7214dd1a |
|
BLAKE2b-256 | 86b74602d612a6ac34279bdb88713d492b34ca264f4954f8dbe80c0bee1858fd |