PENMAN notation for graphs (e.g., AMR)
Project description
This package models graphs encoded in PENMAN notation (e.g., AMR), such as the following for the boy wants to go:
(w / want-01
:ARG0 (b / boy)
:ARG1 (g / go
:ARG0 b))
The Penman package may be used as a Python library or as a script.
Features
- Read and write PENMAN-serialized graphs or triple conjunctions
- Read metadata in comments (e.g.,
# ::id 1234
) - Read surface alignments (e.g.,
foo~e.1,2
) - Inspect and manipulate the graph or tree structures
- Customize graphs for writing:
- adjust indentation and compactness
- select a new top node
- rearrange edges (partially implemented)
- restructure the tree shape
- Transform the graph
- Canonicalize roles
- Reify edges
- Reify attributes
- Embed the tree structure with additional
TOP
triples
- AMR model: role inventory and transformations
- Tested (but not yet 100% coverage)
- Documented (see the documentation)
Library Usage
>>> import penman
>>> g = penman.decode('(b / bark :ARG0 (d / dog))')
>>> g.triples
[('b', ':instance', 'bark'), ('b', ':ARG0', 'd'), ('d', ':instance', 'dog')]
>>> print(penman.encode(g))
(b / bark
:ARG0 (d / dog))
>>> print(penman.encode(g, top='d', indent=6))
(d / dog
:ARG0-of (b / bark))
>>> print(penman.encode(g, indent=False))
(b / bark :ARG0 (d / dog))
Script Usage
$ penman --help
usage: penman [-h] [-V] [--model FILE | --amr] [--indent N] [--compact]
[--triples] [--canonicalize-roles] [--reify-edges]
[--reify-attributes] [--indicate-branches]
[FILE [FILE ...]]
Read and write graphs in the PENMAN notation.
positional arguments:
FILE read graphs from FILEs instead of stdin
optional arguments:
-h, --help show this help message and exit
-V, --version show program's version number and exit
--model FILE JSON model file describing the semantic model
--amr use the AMR model
formatting options:
--indent N indent N spaces per level ("no" for no newlines)
--compact compactly print node attributes on one line
--triples print graphs as triple conjunctions
normalization options:
--canonicalize-roles canonicalize role forms
--reify-edges reify all eligible edges
--reify-attributes reify all attributes
--indicate-branches insert triples to indicate tree structure
$ penman <<< "(w / want-01 :ARG0 (b / boy) :ARG1 (g / go :ARG0 b))"
(w / want-01
:ARG0 (b / boy)
:ARG1 (g / go
:ARG0 b))
Requirements
- Python 3.6+
PENMAN Notation
The PENMAN project was a large effort at natural language generation, and what I'm calling "PENMAN notation" is more accurately "Sentence Plan Language" (SPL; [Kaspar 1989][]), but I'll stick with "PENMAN notation" because it may be a more familiar name to modern users and it also sounds less specific to sentence representations, e.g., in case someone wants to use the format to encode arbitrary graphs.
This module expands the notation slightly to allow for untyped nodes
(e.g., (x)
) and anonymous relations (e.g., (x : (y))
). A PEG*
definition for the notation is given below (for simplicity, whitespace
is not explicitly included; assume all nonterminals can be surrounded
by /\s+/
):
# Syntactic productions
Start <- Node
Node <- '(' Variable NodeLabel? Relation* ')'
NodeLabel <- '/' Concept Alignment?
Concept <- Atom
Relation <- Role Alignment? (Edge | Attribute)
Edge <- Variable Alignment?
| Node
Attribute <- Atom Alignment?
Atom <- String | Float | Integer | Symbol
Variable <- Symbol
# Lexical productions
Role <- /:[^\s()\/,:~]*/
String <- /"[^"\\]*(?:\\.[^"\\]*)*"/
Float <- /[-+]?(((\d+\.\d*|\.\d+)([eE][-+]?\d+)?)|\d+[eE][-+]?\d+)/
Integer <- /[-+]?\d+(?=[ )\/:])/
Symbol <- /[^\s()\/,:~]+/
Alignment <- /~([a-zA-Z]\.?)?\d+(,\d+)*/
* Note: I use |
above for ordered-choice instead of /
so that /
can be used to surround regular expressions.
There is ambiguity in that both Edge
and Attribute
can resolve
their first token to Symbol
(via the Variable
and Atom
productions, respectively), but they are shown like this for their
semantic contribution. A Variable
is distinguished from other
Symbol
tokens simply by its use as the identifier of a node.
Examples of non-variable Symbol
tokens in AMR are concepts (e.g.,
want-01
), -
in :polarity -
, expressive
in :mode expressive
,
etc.
The above grammar is for the PENMAN graphs that this library supports,
but AMR is more restrictive. A variant for AMR might make the
NodeLabel
nonterminal required on Node
, change Atom
to Symbol
on Concept
, change Role
to require at least one character after
:
or even spell out all valid roles, and change Variable
to a form
like /[a-z]+[0-9]*/
. See also Nathan Schneider's PEG for
AMR.
Disclaimer
This project is not affiliated with ISI, the PENMAN project, or the AMR project.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file Penman-0.7.0.tar.gz
.
File metadata
- Download URL: Penman-0.7.0.tar.gz
- Upload date:
- Size: 23.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.14.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.7.5rc1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4b3f7a4629803febfcd3e63f4f371b8b0d328f998c8dcca2993583c7265dae51 |
|
MD5 | 1122cd598bc5b5cbcd425a87df71c264 |
|
BLAKE2b-256 | 71f7591487da34716babe9e12a5f1b0ce80a80a7f48759069f7311580e765cdd |
File details
Details for the file Penman-0.7.0-py3-none-any.whl
.
File metadata
- Download URL: Penman-0.7.0-py3-none-any.whl
- Upload date:
- Size: 29.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.14.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.7.5rc1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 70fd0864a434ac511bf6a061f919736eca93ff9ccfc2bd9da14ff0bfbdc73baf |
|
MD5 | 299d8f6e526a5e855c5bae41a729c85c |
|
BLAKE2b-256 | c2429969b24e746b80435826f40cabc79ae319e7c6ede24ae9940f50fb2ae107 |