Skip to main content

Recursive descent parsing library based on functional combinators

Project description

Funcparserlib

Recursive descent parsing library for Python based on functional combinators.

PyPI PyPI - Downloads

Description

The primary focus of funcparserlib is parsing little languages or external DSLs (domain specific languages).

Parsers made with funcparserlib are pure-Python LL(*) parsers. It means that it's very easy to write parsers without thinking about lookaheads and other hardcore parsing stuff. However, recursive descent parsing is a rather slow method compared to LL(k) or LR(k) algorithms. Still, parsing with funcparserlib is at least twice faster than PyParsing, a very popular library for Python.

The source code of funcparserlib is only 1.2K lines of code, with lots of comments. Its API is fully type hinted. It features the longest parsed prefix error reporting, as well as a tiny lexer generator for token position tracking.

The idea of parser combinators used in funcparserlib comes from the Introduction to Functional Programming course. We have converted it from ML into Python.

Installation

You can install funcparserlib from PyPI:

$ pip install funcparserlib

There are no dependencies on other libraries.

Documentation

There are several examples available in the tests/ directory:

See also the changelog.

Example

Let's consider a little language of numeric expressions with a syntax similar to Python expressions. Here are some expression strings in this language:

0
1 + 2 + 3
-1 + 2 ** 32
3.1415926 * (2 + 7.18281828e-1) * 42

Here is the complete source code of the tokenizer and the parser for this language written using funcparserlib:

from typing import List, Tuple, Union
from dataclasses import dataclass

from funcparserlib.lexer import make_tokenizer, TokenSpec, Token
from funcparserlib.parser import tok, Parser, many, forward_decl, finished


@dataclass
class BinaryExpr:
    op: str
    left: "Expr"
    right: "Expr"


Expr = Union[BinaryExpr, int, float]


def tokenize(s: str) -> List[Token]:
    specs = [
        TokenSpec("whitespace", r"\s+"),
        TokenSpec("float", r"[+\-]?\d+\.\d*([Ee][+\-]?\d+)*"),
        TokenSpec("int", r"[+\-]?\d+"),
        TokenSpec("op", r"(\*\*)|[+\-*/()]"),
    ]
    tokenizer = make_tokenizer(specs)
    return [t for t in tokenizer(s) if t.type != "whitespace"]


def parse(tokens: List[Token]) -> Expr:
    int_num = tok("int") >> int
    float_num = tok("float") >> float
    number = int_num | float_num

    expr: Parser[Token, Expr] = forward_decl()
    parenthesized = -op("(") + expr + -op(")")
    primary = number | parenthesized
    power = primary + many(op("**") + primary) >> to_expr
    term = power + many((op("*") | op("/")) + power) >> to_expr
    sum = term + many((op("+") | op("-")) + term) >> to_expr
    expr.define(sum)

    document = expr + -finished

    return document.parse(tokens)


def op(name: str) -> Parser[Token, str]:
    return tok("op", name)


def to_expr(args: Tuple[Expr, List[Tuple[str, Expr]]]) -> Expr:
    first, rest = args
    result = first
    for op, expr in rest:
        result = BinaryExpr(op, result, expr)
    return result

Now, consider this numeric expression: 3.1415926 * (2 + 7.18281828e-1) * 42.

Let's tokenize() it using the tokenizer we've created with funcparserlib.lexer:

[
    Token('float', '3.1415926'),
    Token('op', '*'),
    Token('op', '('),
    Token('int', '2'),
    Token('op', '+'),
    Token('float', '7.18281828e-1'),
    Token('op', ')'),
    Token('op', '*'),
    Token('int', '42'),
]

Let's parse() these tokens into an expression tree using our parser created with funcparserlib.parser:

BinaryExpr(
    op='*',
    left=BinaryExpr(
        op='*',
        left=3.1415926,
        right=BinaryExpr(op='+', left=2, right=0.718281828),
    ),
    right=42,
)

Learn how to write this parser using funcparserlib in the Getting Started guide!

Used By

Some open-source projects that use funcparserlib as an explicit dependency:

  • Hy, a Lisp dialect that's embedded in Python
    • 4.2K stars, version >= 1.0.0a0, Python 3.7+
  • Spash, a JavaScript rendering service with HTTP API, by Scrapinghub
    • 3.6K stars, version *. Python 3 in Docker
  • graphite-beacon, a simple alerting system for Graphite metrics
    • 459 stars, version ==0.3.6, Python 2 and 3
  • blockdiag, generates block-diagram image file from spec-text file
    • 148 stars, version >= 1.0.0a0, Python 3.7+
  • kll, Keyboard Layout Language (KLL) compiler
    • 109 stars, copied source code, Python 3.5+

Next

Read the Getting Started guide to start learning funcparserlib.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

funcparserlib-1.0.0a2.tar.gz (17.3 kB view details)

Uploaded Source

Built Distribution

funcparserlib-1.0.0a2-py2.py3-none-any.whl (17.8 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file funcparserlib-1.0.0a2.tar.gz.

File metadata

  • Download URL: funcparserlib-1.0.0a2.tar.gz
  • Upload date:
  • Size: 17.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for funcparserlib-1.0.0a2.tar.gz
Algorithm Hash digest
SHA256 0156fceccf0bbfb32886a31139eecc52cedb8c5df716b17684a3c9ba7d4965bd
MD5 8ea32466ca2ed9f05b1ac3a83f75981e
BLAKE2b-256 8608aae8ed1fba2e881563fa5aa5cc9b678523ed68dd3746fb0f55720c514a0c

See more details on using hashes here.

File details

Details for the file funcparserlib-1.0.0a2-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for funcparserlib-1.0.0a2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 89b286d5ff22f0ba9e2a4d6bd77e52d2283e85d462851a8e79ba89fb64933d65
MD5 bc3d5b290f7fe0b4cff22e783d40048e
BLAKE2b-256 cc7e4700eb73f8f48b1c867006ce10cbf0e19ccd7f001734730e0106531713ed

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page