The big package is a grab-bag of cool code for use in your programs.
Project description
Copyright 2022 by Larry Hastings
big is a Python package, a grab-bag of useful technology I always want to have handy. Finally! Instead of copying-and-pasting all my little helper functions between projects, I have them all in one easily-importable place. And, since it's a public package, you can use 'em too!
big requires Python 3.6 or newer. It has few dependencies.
Think big!
Using big
To use big, just install the big package (and its dependencies) from PyPI using your favorite Python package manager. Then simply
import big
in your programs.
Internally big is broken up into submodules, aggregated together
loosely by problem domain. But all functions are also imported into the top-level
module; every function or class published in big is available directly in the
big
module.
big is designed to be safe for use with import *
:
from big import *
but that's up to you.
big is licensed using the MIT license. You're free to use it and even ship it in your own programs, as long as you leave my copyright notice on the source code.
Index
fgrep(path, text, *, encoding=None, enumerate=False, case_insensitive=False)
get_float(o, default=_sentinel)
get_int_or_float(o, default=_sentinel)
grep(path, pattern, *, encoding=None, enumerate=False, flags=0)
multisplit(text, separators, *, maxsplit=-1)
re_partition(text, pattern, *, flags=0)
re_rpartition(text, pattern, *, flags=0)
split_text_with_code(s, *, tab_width=8, allow_code=True, code_indent=4, convert_tabs_to_spaces=True)
timestamp_3339Z(t=None, want_microseconds=None)
timestamp_human(t=None, want_microseconds=None)
TopologicalSorter.remove(node)
translate_filename_to_exfat(s)
wrap_words(words, margin=79, *, two_spaces=True)
API Reference
big.boundinnerclass
Class decorators that implement bound inner classes. See the Bound inner classes section for more information.
BoundInnerClass(cls)
Class decorator for an inner class. When accessing the inner class through an instance of the outer class, "binds" the inner class to the instance. This changes the signature of the inner class's
__init__
fromdef __init__(self, *args, **kwargs):`to
def __init__(self, outer, *args, **kwargs):where
outer
is the instance of the outer class.
UnboundInnerClass(cls)
Class decorator for an inner class that prevents binding the inner class to an instance of the outer class.
Subclasses of a class decorated with
BoundInnerClass
must always be decorated with eitherBoundInnerClass
orUnboundInnerClass
.
big.builtin
Functions for working with builtins. (Named builtin
to avoid a
name collision with the builtins
module.)
get_float(o, default=_sentinel)
Returns
float(o)
, unless that conversion fails, in which case returns the default value. If you don't pass in an explicit default value, the default value iso
.
get_int(o, default=_sentinel)
Returns
int(o)
, unless that conversion fails, in which case returns the default value. If you don't pass in an explicit default value, the default value iso
.
get_int_or_float(o, default=_sentinel)
Converts
o
into a number, preferring an int to a float.If
o
is already an int or float, returnso
unchanged. Otherwise, triesint(o)
. If that conversion succeeds, returns the result. Otherwise, triesfloat(o)
. If that conversion succeeds, returns the result. Otherwise returns the default value. If you don't pass in an explicit default value, the default value iso
.
try_float(o)
Returns
True
ifo
can be converted into a float, andFalse
if it can't.
try_int(o)
Returns
True
ifo
can be converted into an int, andFalse
if it can't.
big.file
Functions for working with files, directories, and I/O.
fgrep(path, text, *, encoding=None, enumerate=False, case_insensitive=False)
Find the lines of a file that match some text, like the UNIX
fgrep
utility program.
path
should be an object representing a path to an existing file, one of:
- a string,
- a bytes object, or
- a
pathlib.Path
object.
text
should be either string or bytes.
encoding
is used as the file encoding when opening the file.if
text
is a str, the file is opened in text mode. iftext
is a bytes object, the file is opened in binary mode.encoding
must beNone
when the file is opened in binary mode.If
case_insensitive
is true, perform the search in a case-insensitive manner.Returns a list of lines in the file containing
text
. The lines are either strings or bytes objects, depending on the type ofpattern
. The lines have their newlines stripped but preserve all other whitespace.If
enumerate
is true, returns a list of tuples of (line_number, line). The first line of the file is line number 1.
file_mtime(path)
Returns the modification time of
path
, in seconds since the epoch. Note that seconds is a float, indicating the sub-second with some precision.
file_mtime_ns(path)
Returns the modification time of
path
, in nanoseconds since the epoch.
file_size(path)
Returns the size of the file at
path
, as an integer representing the number of bytes.
grep(path, pattern, *, encoding=None, enumerate=False, flags=0)
Look for matches to a regular expression pattern in the lines of a file, like the UNIX
grep
utility program.
path
should be an object representing a path to an existing file, one of:
- a string,
- a bytes object, or
- a
pathlib.Path
object.
pattern
should be an object containing a regular expression, one of:
- a string,
- a bytes object, or
- an
re.Pattern
, initialized with eitherstr
orbytes
.
encoding
is used as the file encoding when opening the file.if
pattern
uses astr
, the file is opened in text mode. ifpattern
uses a bytes object, the file is opened in binary mode.encoding
must beNone
when the file is opened in binary mode.
flags
is passed in as theflags
argument tore.compile
ifpattern
is a string or bytes. (It's ignored ifpattern
is anre.Pattern
object.)Returns a list of lines in the file matching the pattern. The lines are either strings or bytes objects, depending on the type of
text
. The lines have their newlines stripped but preserve all other whitespace.If
enumerate
is true, returns a list of tuples of(line_number, line)
. The first line of the file is line number 1.(Tip: to perform a case-insensitive pattern match, pass in the
re.IGNORECASE
flag into flags for this function (if pattern is a string or bytes) or when creating your regular expression object (if pattern is anre.Pattern
object).
pushd(directory)
A context manager that temporarily changes the directory. Example:
with big.pushd('x'): passThis would change into the
'x'
subdirectory before executing the nested block, then change back to the original directory after the nested block.You can change directories in the nested block; this won't affect pushd restoring the original current working directory upon exiting the nested block.
safe_mkdir(path)
Ensures that a directory exists at
path
. If this function returns and doesn't raise, it guarantees that a directory exists atpath
.If a directory already exists at
path
, does nothing.If a file exists at
path
, unlinks it then creates the directory.If the parent directory doesn't exist, creates it, then creates
path
.This function can still fail:
- 'path' could be on a read-only filesystem.
- You might lack the permissions to create
path
.- You could ask to create the directory 'x/y' and 'x' is a file (not a directory).
safe_unlink(path)
Unlinks
path
, ifpath
exists and is a file.
touch(path)
Ensures that
path
exists, and its modification time is the current time.If
path
does not exist, creates an empty file.If
path
exists, updates its modification time to the current time.
translate_filename_to_exfat(s)
Ensures that all characters in s are legal for a FAT filesystem.
Returns a copy of
s
where every character not allowed in a FAT filesystem filename has been replaced with a character (or characters) that are permitted.
translate_filename_to_unix(s)
Ensures that all characters in s are legal for a UNIX filesystem.
Returns a copy of
s
where every character not allowed in a UNIX filesystem filename has been replaced with a character (or characters) that are permitted.
big.graph
A drop-in replacement for Python's
graphlib.TopologicalSorter
with an enhanced API. This version of TopologicalSorter
allows modifying the
graph at any time, and supports multiple simultaneous views, allowing
iteration over the graph more than once.
See the Enhanced TopologicalSorter
section for more information.
CycleError
Exception thrown by
TopologicalSorter
when it detects a cycle.
TopologicalSorter(graph=None)
An object representing a directed graph of nodes. See Python's
graphlib.TopologicalSorter
for concepts and the basic API.
New methods on TopologicalSorter
:
TopologicalSorter.copy()
Returns a shallow copy of the graph. The copy also duplicates the state of
get_ready
anddone
.
TopologicalSorter.cycle()
Checks the graph for cycles. If no cycles exist, returns None. If at least one cycle exists, returns a tuple containing nodes that constitute a cycle.
TopologicalSorter.print(print=print)
Prints the internal state of the graph. Used for debugging.
TopologicalSorter.remove(node)
Remove
node
from the graph.If any node
P
depends on a nodeN
, andN
is removed, this dependency is also removed, butP
is not removed from the graph.remove() works but it's slow (O(N)). TopologicalSorter is optimized for fast adds and fast views.
TopologicalSorter.reset()
Resets get_ready
and done
to their initial state.
TopologicalSorter.view()
Returns a new
View
object on this graph.
TopologicalSorter.View
A view on a
TopologicalSorter
graph object. Allows iterating over the nodes of the graph in dependency order.
Methods on a View
object:
View.__bool__()
Returns
True
if more work can be done in the view--if there are nodes waiting to be yielded byget_ready
, or waiting to be returned bydone
.Aliased to
TopologicalSorter.is_active
for compatibility with graphlib.
View.close()
Closes the view. A closed view can no longer be used.
View.copy()
Returns a shallow copy of the view, duplicating its current state.
View.done(*nodes)
Marks nodes returned by
ready
as "done", possibly allowing additional nodes to be available fromready
.
View.print(print=print)
Prints the internal state of the view, and its graph. Used for debugging.
View.ready()
Returns a tuple of "ready" nodes--nodes with no predecessors, or nodes whose predecessors have all been marked "done".
Aliased to TopologicalSorter.get_ready for compatibility with graphlib.
View.reset()
Resets the view to its initial state, forgetting all "ready" and "done" state.
big.text
Functions for working with text strings. See the Word wrapping and formatting section below for a higher-level view on some of these functions.
merge_columns(*columns, column_separator=" ", overflow_response=OverflowResponse.RAISE, overflow_before=0, overflow_after=0)
Merge n column tuples, with each column tuple being formatted into its own column in the resulting string. Returns a string.
columns
should be an iterable of column tuples. Each column tuple should contain three items:(text, min_width, max_width)
text
should be a single text string, with newline characters separating lines.min_width
andmax_width
are the minimum and maximum permissible widths for that column, not including the column separator (if any).Note that this function does not text-wrap the lines.
column_separator
is printed between every column.
overflow_strategy
tells merge_columns how to handle a column with one or more lines that are wider than that column'smax_width
. The supported values are:
OverflowStrategy.RAISE
: Raise an OverflowError. The default.OverflowStrategy.INTRUDE_ALL
: Intrude into all subsequent columns on all lines where the overflowed column is wider than itsmax_width
.OverflowStrategy.DELAY_ALL
: Delay all columns after the overflowed column, not beginning any until after the last overflowed line in the overflowed column.When
overflow_strategy
isINTRUDE_ALL
orDELAY_ALL
, and eitheroverflow_before
oroverflow_after
is nonzero, these specify the number of extra lines before or after the overflowed lines in a column.
multisplit(text, separators, *, maxsplit=-1)
Like
str.split
, but separators is an iterable of separator strings.
text
can bestr
orbytes
.
separators
should be an iterable. Each element ofseparators
should be the same type astext
. Ifseparators
is a string or bytes object,multisplit
behaves as separators is a tuple containing each individual character.Returns a list of the substrings split from
text
.
maxsplit
should be either an integer orNone
. Ifmaxsplit
is an integer greater than -1, multisplit will splittext
no more thanmaxsplit
times.Example:
multisplit('ab:cd,:ef', ':,')returns
["ab", "cd", "ef"]Example:
multisplit('\tthis is a\n\tbunch of words', (' ', '\t', '\n'))would produce the same result as
'\tthis is a\n\tbunch of words'.split()
re_partition(text, pattern, *, flags=0)
Like
str.partition
, butpattern
is matched as a regular expression.
text
can be a string or a bytes object.
pattern
can be a string, bytes, or anre.Pattern
object.
text
andpattern
(orpattern.pattern
) must be the same type.If
pattern
is found in text, returns a tuple(before, match, after)where
before
is the text before the matched text,match
is there.Match
object resulting from the match, andafter
is the text after the matched text.If
pattern
appears intext
multiple times,re_partition
will match against the first (leftmost) appearance.If
pattern
is not found intext
, returns a tuple(text, None, '')where the empty string is
str
orbytes
as appropriate.If
pattern
is a string or bytes object,flags
is passed in as theflags
argument tore.compile
.
re_rpartition(text, pattern, *, flags=0)
Like
str.rpartition
, butpattern
is matched as a regular expression.
text
can be a string or a bytes object.
pattern
can be a string, bytes, or anre.Pattern
object.
text
andpattern
(orpattern.pattern
) must be the same type.If
pattern
is found intext
, returns a tuple(before, match, after)where
before
is the text before the matched text,match
is the re.Match object resulting from the match, andafter
is the text after the matched text.If
pattern
appears intext
multiple times,re_partition
will match against the last (rightmost) appearance.If
pattern
is not found intext
, returns a tuple('', None, text)where the empty string is
str
orbytes
as appropriate.If
pattern
is a string,flags
is passed in as theflags
argument tore.compile
.
split_text_with_code(s, *, tab_width=8, allow_code=True, code_indent=4, convert_tabs_to_spaces=True)
Splits the string
s
into individual words, suitable for feeding intowrap_words
.Paragraphs indented by less than
code_indent
will be broken up into individual words.If
allow_code
is true, paragraphs indented by at leastcode_indent
spaces will preserve their whitespace: internal whitespace is preserved, and the newline is preserved. (This will preserve the formatting of code examples when these words are rejoined into lines bywrap_words
.)
wrap_words(words, margin=79, *, two_spaces=True)
Combines 'words' into lines and returns the result as a string. Similar to
textwrap.wrap
.'words' should be an iterator containing text split at word boundaries. Example:
"this is an example of text split at word boundaries".split()A single
'\n'
indicates a line break. If you want a paragraph break, embed two'\n'
characters in a row.'margin' specifies the maximum length of each line. The length of every line will be less than or equal to 'margin', unless the length of an individual element inside 'words' is greater than 'margin'.
If 'two_spaces' is true, elements from 'words' that end in sentence-ending punctuation ('.', '?', and '!') will be followed by two spaces, not one.
Elements in 'words' are not modified; any leading or trailing whitespace will be preserved. You can use this to preserve whitespace where necessary, like in code examples.
big.time
Functions for working with time. Currently deals specifically with timestamps. The time functions in big are designed to make it easy to use best practices.
parse_timestamp_3339Z(s)
Parses a timestamp string returned by
timestamp_3339Z
. Returns adatetime.datetime
object.
timestamp_3339Z(t=None, want_microseconds=None)
Return a timestamp string in RFC 3339 format, in the UTC time zone. This format is intended for computer-parsable timestamps; for human-readable timestamps, use
timestamp_human()
.Example timestamp:
'2022-05-25T06:46:35.425327Z'
t
may be one of several types:
- If
t
is None,timestamp_3339Z
uses the current time in UTC.- If
t
is an int or a float, it's interpreted as seconds since the epoch in the UTC time zone.- If
t
is atime.struct_time
object ordatetime.datetime
object, and it's not in UTC, it's converted to UTC. (Technically,time.struct_time
objects are converted to GMT, usingtime.gmtime
. Sorry, pedants!)If
want_microseconds
is true, the timestamp ends with microseconds, represented as a period and six digits between the seconds and the'Z'
. Ifwant_microseconds
isfalse
, the timestamp will not include this text. Ifwant_microseconds
isNone
(the default), the timestamp ends with microseconds if the type oft
can represent fractional seconds: a float, adatetime
object, or the valueNone
.
timestamp_human(t=None, want_microseconds=None)
Return a timestamp string formatted in a pleasing way using the currently-set local timezone. This format is intended for human readability; for computer-parsable time, use
timestamp_3339Z()
.Example timestamp:
"2022/05/24 23:42:49.099437"
t
can be one of several types:
- If
t
isNone
,timestamp_human
uses the current local time.- If
t
is an int or float, it's interpreted as seconds since the epoch.- If
t
is atime.struct_time
ordatetime.datetime
object, it's converted to the local timezone.If
want_microseconds
is true, the timestamp will end with the microseconds, represented as ".######". Ifwant_microseconds
is false, the timestamp will not include the microseconds. Ifwant_microseconds
isNone
(the default), the timestamp ends with microseconds if the type oft
can represent fractional seconds: a float, adatetime
object, or the valueNone
.
Subsystem notes
Word wrapping and formatting
big contains three functions used to reflow and format text
in a pleasing manner. In the order you should use them, they are
split_text_with_code
, word_wrap
, and optionally merge_columns
.
This trio of functions gives you the following word-wrap superpowers:
- Paragraphs of text representing embedded "code" don't get word-wrapped. Instead, their formatting is preserved.
- Multiple strings can be merged together into columns.
Split text array
split_text_with_code
splits a string of text into a
split text array,
and word_wrap
consumes a split text array to produce its
word-wrapped output. A split text array is an array of strings.
You'll see four kinds of strings in a split text array:
- Individual words, ready to be word-wrapped.
- Entire lines of "code", preserving their formatting.
- Line breaks, represented by a single newline:
'\n'
. - Paragraph breaks, represented by two newlines:
'\n\n'
.
When split_text_with_code
splits a string, it views each
line as either a "text" line or a "code" line. Any non-blank
line that starts with code_indent
or more spaces (or the
equivalent using tabs) is a "code" line, and any other
non-blank line is a "text" line. But it has some state
here; when split_text_with_code
sees a "text" line,
it switches into "text" mode, and when it sees a "code"
line it switches into "code" mode.
In "text" mode:
- words are separated by whitespace,
- initial whitespace on the line is discarded,
- the amount of whitespace between words is irrelevant,
- individual newline characters are ignored, and
- more than two newline characters are converted into exactly two newlines (aka a "paragraph break").
In "code" mode:
- all whitespace is preserved, except for trailing whitespace on a line, and
- all newline characters are preserved.
Also, whenever split_text_with_code
switches between
"text" and "code" mode, it emits a paragraph break.
This might be clearer with an example or two. The following text:
hello there!
this is text.
this is a second paragraph!
would be represented in a Python string as:
"hello there!\nthis is text.\n\n\nthis is a second paragraph!"
Note the three newlines between the second and third lines.
split_text_with_code
would turn this into the following split text array:
[ 'hello', 'there!', 'this', 'is', 'text.', '\n\n',
'this', 'is', 'a', 'second', 'paragraph!']
split_text_with_code
merged the first two lines together into
a single paragraph, and collapsed the three newlines separating
the two paragraphs into a "paragraph break" marker
(two newlines in one string).
And this text:
What are the first four squared numbers?
for i in range(1, 5):
print(i**2)
Python is just that easy!
would be represented in a Python string as:
"What are the first four squared numbers?\n\n for i in range(1, 5):\n\n\n print(i**2)\n\nPython is just that easy!"
split_text_with_code
considers the two lines with initial whitespace as "code" lines,
and so the text is split into the following split text array:
['What', 'are', 'the', 'first', 'four', 'squared', 'numbers?', '\n\n',
' for i in range(1, 5):', '\n', '\n', '\n', ' print(i**2)', '\n\n',
'Python', 'is', 'just', 'that', 'easy!']
Here we have a text paragraph, followed by a "code paragraph", followed by a second text paragraph. The code paragraph preserves the internal newlines, though they are represented as individual "line break" markers (strings containing a single newline). Every paragraph is separated by a "paragraph marker".
Here's a simple algorithm for joining a split text array back into a single string:
prev = None
a = []
for word in split_text_array:
if not (prev and prev.isspace() and word.isspace()):
a.append(' ')
a.append(word)
text = "".join(a)
Of course, this algorithm is too simple to do word wrapping.
Nor does it handle adding two spaces after sentence-ending
punctuation. In practice you should just use wrap_words
.
Merging columns
merge_columns
merges multiple strings into columns on the same line.
For example, it could merge these three Python strings:
[
"Here's the first\ncolumn of text.",
"More text over here!\nIt's the second\ncolumn! How\nexciting!",
"And here's a\nthird column.",
]
into the following text:
Here's the first More text over here! And here's a
column of text. It's the second third column.
column! How
exciting!
(Note that merge_columns
doesn't do its own word-wrapping;
instead, it's designed to consume the output of wrap_words
.)
Each column is passed in to merge_columns
as a "column tuple":
(s, min_width, max_width)
s
is the string,
min_width
is the minimum width of the column, and
max_width
is the minimum width of the column.
As you saw above, s
can contain newline characters,
and merge_columns
obeys those when formatting each
column.
For each column, merge_columns
measures the longest
line of each column. The width of the column is determined
as follows:
- If the longest line is less than
min_width
characters long, the column will bemin_width
characters wide. - If the longest line is less than or equal to
min_width
characters long, and less than or equal tomax_width
characters long, the column will be as wide as the longest line. - If the longest line is greater than
max_width
characters long, the column will bemax_width
characters wide, and lines that are longer thanmax_width
characters will "overflow".
Overflow
What is "overflow"? It's when the text in a column is wider than that
column's max_width
. merge_columns
discusses both "overflow lines",
lines that are longer than max_width
, and "overflow columns", which
are columns that contain any overflow lines.
What does merge_columns
do when it encounters overflow? It provides
three "strategies" to deal with this condition, and you can control
which it uses through the overflow_strategy
parameter. The three are:
OverflowStrategy.RAISE
: Raise an OverflowError
exception. The default.
OverflowStrategy.INTRUDE_ALL
: Intrude into all subsequent columns on
all lines where the overflowed column is wider than its max_width.
The subsequent columns "make space" for the overflow text by pausing their
output on the overflow lines.
OverflowStrategy.DELAY_ALL
: Delay all columns after the overflowed
column, not beginning any until after the last overflowed line
in the overflowed column. This is like INTRUDE_ALL
, except that
they "make space" by pausing their output until the last overflowed
line.
When overflow_strategy
is INTRUDE_ALL
or DELAY_ALL
, and
either overflow_before
or overflow_after
is nonzero, these
specify the number of extra lines before or after
the overflowed lines in a column where the subsequent columns
"pause".
Enhanced TopologicalSorter
Overview
big's TopologicalSorter
is a drop-in replacement for
graphlib.TopologicalSorter
in the Python standard library (new in 3.9). However, the version in big has been greatly upgraded:
prepare
is now optional, though it still performs a cycle check.- You can add nodes and edges to a graph at any time, even while iterating over the graph. Adding nodes and edges always succeeds.
- You can remove nodes from graph
g
with the new methodg.remove(node)
. Again, you can do this at any time, even while iterating over the graph. Removing a node from the graph always succeeds, assuming the node is in the graph. - The functionality for iterating over a graph now lives in its own object called
a view. View objects implement the
get_ready
,done
, and__bool__
methods. There's a default view built in to the graph object; theget_ready
,done
, and__bool__
methods on a graph just call into the graph's default view. You can create a new view at any time by calling the newview
method.
Note that if you're using a view to iterate over the graph, and you modify the graph,
and the view now represents a state that isn't coherent with the graph,
attempting to use that view raises a RuntimeError
. More on what I mean
by "coherence" in a minute.
This implementation also fixes some minor warts with the existing API:
- In Python's implementation,
static_order
andget_ready
/done
are mutually exclusive. If you ever callget_ready
on a graph, you can never callstatic_order
, and vice-versa. The implementaiton in big doesn't have this restriction, because its implementation ofstatic_order
creates and uses a new view object every time it's called.. - In Python's implementation, you can only iterate over the graph once, or call
static_order
once. The implementation in big solves this in several ways: it allows you to create as many views as you want, and you can call the newreset
method on a view to reset it to its initial state.
Graph / view coherence
So what does it mean for a view to no longer be coherent with the graph? Consider the following code:
g = big.TopologicalSorter()
g.add('B', 'A')
g.add('C', 'A')
g.add('D', 'B', 'C')
g.add('B', 'A')
v = g.view()
g.ready() # returns ('A',)
g.add('A', 'Q')
First this code creates a graph g
with a classic "diamond"
dependency pattern. Then it creates a new view v
, and gets
the currently "ready" nodes, which consists just of the node
'A'
. Finally it adds a new dependency: 'A'
depends on 'Q'
.
At this moment, view v
is no longer coherent. 'A'
has been
marked as "ready", but 'Q'
has not. And yet 'A'
depends on 'Q'
.
All those statements can't be true at the same time!
So view v
is no longer coherent, and any attempt to interact
with v
raises an exception.
To state it more precisely: if view v
is a view on graph g
,
and you call g.add('Z', 'Y')
,
and neither of these statements is true in view v
:
- 'Y' has been marked as
done
. - 'Z' has not yet been yielded by
get_ready
.
then v
is no longer "coherent".
(If 'Y' has been marked as done
, then it's okay to make 'Z' dependent on
'Y' regardless of what state 'Y' is in. Likewise, if 'Z' hasn't been yielded
by get_ready
yet, then it's okay to make 'Z' dependent on 'Y' regardless
of what state 'Y' is in.)
Note that you can restore a view to coherence. In this case,
removing either Y
or Z
from g
would resolve the incoherence
between v
and g
, and v
would start working again.
Also note that you can have multiple views, in various states of iteration, and by modifying the graph you may cause some to become incoherent but not others. Views are completely independent from each other.
Bound inner classes
Overview
One minor complaint about Python is that inner classes don't have access to the outer object at construction time. Consider this Python code:
class Outer(object):
def method(self):
pass
class Inner(object):
def __init__(self):
pass
o = Outer()
o.method()
i = o.Inner()
When o.method
is called, Python automatically passes in the o
object as the first parameter
(generally called self
). But that doesn't happen when o.Inner
is called. (It does pass in
a self
, but in this case it's the newly-created Inner
object.) There's just no built-in way
for the o.Inner
object being constructed to automatically get a reference to the o
Outer
object. If you need one, you must explicitly pass one in, like so:
class Outer(object):
def method(self):
pass
class Inner(object):
def __init__(self, outer):
self.outer = outer
o = Outer()
o.method()
i = o.Inner(o)
This seems redundant. You don't have to pass in o
explicitly to method calls;
why should you have to pass it in explicitly to inner classes? Well--now you don't have to!
You just need to decorate the inner class with @big.BoundInnerClass
.
Using bound inner classes
Let's modify the above example to use our BoundInnerClass
decorator:
from big import BoundInnerClass
class Outer(object):
def method(self):
pass
@BoundInnerClass
class Inner(object):
def __init__(self, outer):
self.outer = outer
o = Outer()
o.method()
i = o.Inner()
Notice that Inner.__init__
now accepts an outer
parameter,
even though you didn't pass in any arguments to o.Inner
.
Thanks, BoundInnerClass
! You've saved the day.
Inheritance
Bound inner classes get slightly complicated when mixed with inheritance. It's not all that difficult, you merely need to obey the following rules:
-
A bound inner class can inherit normally from any unbound class.
-
To subclass from a bound inner class while still inside the outer class scope, or when referencing the inner class from the outer class (as opposed to an instance of the outer class), you must actually subclass or reference
classname.cls
. This is because inside the outer class, the "class" you see is actually an instance of aBoundInnerClass
object. -
All classes that inherit from a bound inner class must always call the superclass's
__init__
. You don't need to pass in the outer parameter; it'll be automatically passed in to the superclass's__init__
as before. -
An inner class that inherits from a bound inner class, and which also wants to be bound to the outer object, should be decorated with
BoundInnerClass
. -
An inner class that inherits from a bound inner class, but doesn't want to be bound to the outer object, should be decorated with UnboundInnerClass.
Restating the last two rules: every class that descends from any BoundInnerClass
should be decorated with either BoundInnerClass
or UnboundInnerClass
.
Here's a simple example using inheritance with bound inner classes:
from big import BoundInnerClass, UnboundInnerClass
class Outer(object):
@BoundInnerClass
class Inner(object):
def __init__(self, outer):
self.outer = outer
@UnboundInnerClass
class ChildOfInner(Inner.cls):
def __init__(self):
super(Outer.ChildOfInner, self).__init__()
o = Outer()
i = o.ChildOfInner()
We followed the rules:
Inner
inherits from object; since object isn't a bound inner class, there are no special rules about inheritanceInner
needs to obey.ChildOfInner
inherits fromInner.cls
, notInner
.- Since
ChildOfInner
inherits from aBoundInnerClass
, it must be decorated with eitherBoundInnerClass
orUnboundInnerClass
. It doesn't want the outer object passed in, so it's decorated withUnboundInnerClass
. ChildOfInner.__init__
callssuper().__init__
.
Note that, because ChildOfInner
is decorated with UnboundInnerClass
,
it doesn't take an outer
parameter. Nor does it pass in an outer
argument when it calls super().__init__
. But when the constructor for
Inner
is called, the correct outer
parameter is passed in--like magic!
Thanks again, BoundInnerClass
!
If you wanted ChildOfInner
to also get the outer argument passed in to
its __init__
, just decorate it with BoundInnerClass
instead of
UnboundInnerClass
, like so:
from big import BoundInnerClass
class Outer(object):
@BoundInnerClass
class Inner(object):
def __init__(self, outer):
self.outer = outer
@BoundInnerClass
class ChildOfInner(Inner.cls):
def __init__(self, outer):
super(Outer.ChildOfInner, self).__init__()
assert self.outer == outer
o = Outer()
i = o.ChildOfInner()
Again, ChildOfInner.__init__
doesn't need to explicitly
pass in outer
when calling super.__init__
.
You can see more complex examples of using inheritance with
BoundInnerClass
(and UnboundInnerClass
) in the test suite.
Miscellaneous notes
-
If you refer to a bound inner class directly from the outer class, rather than using the outer instance, you get the original class. This means that references to
Outer.Inner
are consistent, and it's a base class of all the bound inner classes. This also means that if you attempt to construct one without using an outer instance, you must pass in the outer parameter by hand, just as you would have to pass in the self parameter by hand when calling an unbound method. -
If you refer to a bound inner class from an outer instance, you get a subclass of the original class.
-
Bound classes are cached in the outer object, which both provides a small speedup and ensures that
isinstance
relationships are consistent. -
You must not rename inner classes decorated with either
BoundInnerClass
orUnboundInnerClass
! The implementation ofBoundInnerClass
looks up the bound inner class in the outer object by name in several places. Adding aliases to bound inner classes is harmless, but the original attribute name must always work. -
Bound inner classes from different objects are different classes. This is symmetric with bound methods; if you have two objects
a
andb
that are instances of the same class,a.BoundInnerClass != b.BoundInnerClass
, just asa.method != b.method
. -
The binding only goes one level deep; if you had an inner class
C
inside another inner classB
inside a classA
, the constructor forC
would be called with theB
object, not theA
object. -
Similarly, if you have a bound inner class
B
inside a classA
, and another bound inner classD
inside a classC
, andD
inherits fromB
, the constructor forD
will be called with theB
object but not theA
object. WhenD
callssuper().__init__
it'll have to fill in theouter
parameter by hand. -
There's a race condition in the implementation: if you access a bound inner class through an outer instance from two separate threads, and the bound inner class was not previously cached, the two threads may get different (but equivalent) bound inner class objects, and only one of those instances will get cached on the outer object. This could lead to confusion and possibly cause bugs. For example, you could have two objects that would be considered equal if they were instances of the same bound inner class, but would not be considered equal if instantiated by different instances of that same bound inner class. There's an easy workaround for this problem: access the bound inner class from the
__init__
of the outer class, which should allow the code to cache the bound inner class instance before a second thread could ever get a reference to the outer object.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file big-0.5.tar.gz
.
File metadata
- Download URL: big-0.5.tar.gz
- Upload date:
- Size: 1.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.27.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bd0a7f75c2fd280b31acb574df0c8ff9cd6642b29138731e33a7cdc9eb8dbc1f |
|
MD5 | bcbbdfcad0c774633d8aef16b2e5ef48 |
|
BLAKE2b-256 | b0e4ee57a7ddee9aaffb209bfa89ce1c21de810fab19c3684478c2c30492e54f |
File details
Details for the file big-0.5-py3-none-any.whl
.
File metadata
- Download URL: big-0.5-py3-none-any.whl
- Upload date:
- Size: 38.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.27.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6313345fb1331bb8dca579be1eb19a7c19764e5cb344b7ea81ca8e1d7a81831e |
|
MD5 | 31028f5d11a857a2b1a9778ff95132cf |
|
BLAKE2b-256 | a68eb3f7d38df615045fc0cd31ed16e9d26e9ff78cb0c61840527e5058666439 |