Draw Lex et al.'s UpSet plots with Pandas and Matplotlib
Project description
This is another Python implementation of UpSet plots by Lex et al. [Lex2014]. UpSet plots are used to visualise set overlaps; like Venn diagrams but more readable. Documentation is at https://upsetplot.readthedocs.io.
This upsetplot library tries to provide a simple interface backed by an extensible, object-oriented design.
The basic input format is a pandas.Series containing counts corresponding to subset sizes, where each subset is an intersection of named categories. The index of the Series indicates which rows pertain to which categories, by having multiple boolean indices, like example in the following:
>>> from upsetplot import generate_counts >>> example = generate_counts() >>> example # doctest: +NORMALIZE_WHITESPACE cat0 cat1 cat2 False False False 56 True 283 True False 1279 True 5882 True False False 24 True 90 True False 429 True 1957 Name: value, dtype: int64
Then:
>>> from upsetplot import plot >>> plot(example) # doctest: +SKIP >>> from matplotlib import pyplot >>> pyplot.show() # doctest: +SKIP
makes:
This plot shows the cardinality of every category combination seen in our data. The leftmost column counts items absent from any category. The next three columns count items only in cat1, cat2 and cat3 respectively, with following columns showing cardinalities for items in each combination of exactly two named sets. The rightmost column counts items in all three sets.
Rotation
We call the above plot style “horizontal” because the category intersections are presented from left to right. Vertical plots are also supported!
Distributions
Providing a DataFrame rather than a Series as input allows us to expressively plot the distribution of variables in each subset.
Loading datasets
While the dataset above is randomly generated, you can prepare your own dataset for input to upsetplot. A helpful tool is from_memberships, which allows us to reconstruct the example above by indicating each data point’s category membership:
>>> from upsetplot import from_memberships >>> example = from_memberships( ... [[], ... ['cat2'], ... ['cat1'], ... ['cat1', 'cat2'], ... ['cat0'], ... ['cat0', 'cat2'], ... ['cat0', 'cat1'], ... ['cat0', 'cat1', 'cat2'], ... ], ... data=[56, 283, 1279, 5882, 24, 90, 429, 1957] ... ) >>> example # doctest: +NORMALIZE_WHITESPACE cat0 cat1 cat2 False False False 56 True 283 True False 1279 True 5882 True False False 24 True 90 True False 429 True 1957 dtype: int64
See also from_contents, another way to describe categorised data.
Installation
To install the library, you can use pip:
$ pip install upsetplot
Installation requires:
pandas
matplotlib >= 2.0
seaborn to use UpSet.add_catplot
It should then be possible to:
>>> import upsetplot
in Python.
Why an alternative to py-upset?
Probably for petty reasons. It appeared py-upset was not being maintained. Its input format was undocumented, inefficient and, IMO, inappropriate. It did not facilitate showing plots of each subset’s distribution as in Lex et al’s work introducing UpSet plots. Nor did it include the horizontal bar plots illustrated there. It did not support Python 2. I decided it would be easier to construct a cleaner version than to fix it.
References
Alexander Lex, Nils Gehlenborg, Hendrik Strobelt, Romain Vuillemot, Hanspeter Pfister, UpSet: Visualization of Intersecting Sets, IEEE Transactions on Visualization and Computer Graphics (InfoVis ‘14), vol. 20, no. 12, pp. 1983–1992, 2014. doi: doi.org/10.1109/TVCG.2014.2346248
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file UpSetPlot-0.4.1.tar.gz
.
File metadata
- Download URL: UpSetPlot-0.4.1.tar.gz
- Upload date:
- Size: 12.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200325 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c1e23af4d90ca88d024cdea45dc3a84591cd97a80a6a3dfc18b5e7ad2b93944f |
|
MD5 | a9cc4d20b511878b582ed0df6bf42524 |
|
BLAKE2b-256 | 1d55184b8131f5660d8d16aeb9db7c9c21cf9ffc7bec835d21d4d64a40da9874 |