A wrapper around the stdlib `tokenize` which roundtrips.
Project description
tokenize-rt
The stdlib tokenize
module does not properly roundtrip. This wrapper
around the stdlib provides two additional tokens ESCAPED_NL
and
UNIMPORTANT_WS
, and a Token
data type. Use src_to_tokens
and
tokens_to_src
to roundtrip.
This library is useful if you're writing a refactoring tool based on the python tokenization.
Installation
pip install tokenize-rt
Usage
datastructures
tokenize_rt.Offset(line=None, utf8_byte_offset=None)
A token offset, useful as a key when cross referencing the ast
and the
tokenized source.
tokenize_rt.Token(name, src, line=None, utf8_byte_offset=None)
Construct a token
name
: one of the token names listed intoken.tok_name
orESCAPED_NL
orUNIMPORTANT_WS
src
: token's source as textline
: the line number that this token appears on.utf8_byte_offset
: the utf8 byte offset that this token appears on in the line.
tokenize_rt.Token.offset
Retrieves an Offset
for this token.
converting to and from Token
representations
tokenize_rt.src_to_tokens(text: str) -> List[Token]
tokenize_rt.tokens_to_src(Iterable[Token]) -> str
additional tokens added by tokenize-rt
tokenize_rt.ESCAPED_NL
tokenize_rt.UNIMPORTANT_WS
helpers
tokenize_rt.NON_CODING_TOKENS
A frozenset
containing tokens which may appear between others while not
affecting control flow or code:
COMMENT
ESCAPED_NL
NL
UNIMPORTANT_WS
tokenize_rt.parse_string_literal(text: str) -> Tuple[str, str]
parse a string literal into its prefix and string content
>>> parse_string_literal('f"foo"')
('f', '"foo"')
tokenize_rt.reversed_enumerate(Sequence[Token]) -> Iterator[Tuple[int, Token]]
yields (index, token)
pairs. Useful for rewriting source.
tokenize_rt.rfind_string_parts(Sequence[Token], i) -> Tuple[int, ...]
find the indices of the string parts of a (joined) string literal
i
should start at the end of the string literal- returns
()
(an empty tuple) for things which are not string literals
>>> tokens = src_to_tokens('"foo" "bar".capitalize()')
>>> rfind_string_parts(tokens, 2)
(0, 2)
>>> tokens = src_to_tokens('("foo" "bar").capitalize()')
>>> rfind_string_parts(tokens, 4)
(1, 3)
Differences from tokenize
tokenize-rt
addsESCAPED_NL
for a backslash-escaped newline "token"tokenize-rt
addsUNIMPORTANT_WS
for whitespace (discarded intokenize
)tokenize-rt
normalizes string prefixes, even if they are not parsed -- for instance, this means you'll seeToken('STRING', "f'foo'", ...)
even in python 2.tokenize-rt
normalizes python 2 long literals (4l
/4L
) and octal literals (0755
) in python 3 (for easier rewriting of python 2 code while running python 3).
Sample usage
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for tokenize_rt-6.0.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d4ff7ded2873512938b4f8cbb98c9b07118f01d30ac585a30d7a88353ca36d22 |
|
MD5 | 79de1af335549eec7d4bbb7977e724ee |
|
BLAKE2b-256 | 5cc244486862562c6902778ccf88001ad5ea3f8da5c030c638cac8be72f65b40 |