Jupyter Notebook extension to levarage pandas DataFrames by integrating DataTables JS.
Project description
Jupyter DataTables
Jupyter Notebook extension to levarage pandas DataFrames by integrating DataTables JS.
About
Data scientists and in fact many developers work with pd.DataFrame
on daily basis to interpret data to process them. In my typical workflow. The common workflow is to display the dataframe, take a look at the data schema and then produce multiple plots to check the distribution of the data to have a clearer picture, perhaps search some data in the table, etc...
What if those distribution plots were part of the standard DataFrame and we had the ability to quickly search through the table with minimal effort? What if it was the default representation?
The jupyter-datatables uses jupyter-require to draw the table.
Installation
pip install jupyter-datatables
And enable the required extensions
jupyter nbextension install --sys-prefix --py jupyter_require
jupyter nbextension enable jupyter-require/extension
Usage
import numpy as np
import pandas as pd
from jupyter_datatables import init_datatables_mode
init_datatables_mode()
That's it, your default pandas representation will now use Jupyter DataTables!
df = pd.DataFrame(np.abs(np.random.randn(50, 6)), columns=list(string.ascii_uppercase[:6]))
In most cases, you don't need to worry too much about the size of your data. Jupyter DataTables calculates required sample size based on a confidence interval (by default this would be 0.95
) and margin of error and ceils it to the highest 'smart' value.
For example, for a data containing 100,000
samples, given 0.975
confidence interval and 0.02
margin of error, the Jupyter DataTables would calculate that 3044
samples are required and it would round it up to 4000
.
With additional note:
Sample size: 4,000 out of 100,000
We can also handle wide tables with ease.
df = pd.DataFrame(np.abs(np.random.randn(50, 20)), columns=list(string.ascii_uppercase[:20]))
As per 0.2.0, there is a support for multiple dtypes like object
, categorical
and datetime
.
dft = pd.DataFrame({'A': np.random.rand(5),
'B': [1, 1, 3, 2, 1],
'C': 'foo',
'C_': 'This is a very long sentence that should automatically be trimmed',
'D': [
pd.Timestamp('20010101'), pd.Timestamp('20010102'),
pd.Timestamp('20010103'), pd.Timestamp('20010103'),
pd.Timestamp('20010104')
],
'E': pd.Series([1.0] * 5).astype('float32'),
'F': [False, True, False, False, True],
'G': pd.Series([1] * 5, dtype='int8')}
)
The future plans:
-
allow custom operations on the table:
- edit column name
- edit column type
-
handle multi index
-
handle nested data
-
improve plotting:
- performance and efficiency
- customizable
- resizable
- dockable
- draggable to a Jupyter cell (??)
-
[stretch goal] increased performance and space efficiency by server-side processing -- lazy loading
Author: Marek Cermak macermak@redhat.com, @AICoE - Project Thoth
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file jupyter-datatables-0.2.2.tar.gz
.
File metadata
- Download URL: jupyter-datatables-0.2.2.tar.gz
- Upload date:
- Size: 3.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2b0abcb7fbf8f5a7408e7759a51a6e09505108a8a37da570561eaa8c01eaed71 |
|
MD5 | bdb346d624b1f4d2cb49d39f729632e7 |
|
BLAKE2b-256 | 768bc3513c088e618f48e141c6dd2937cc20afdeccdb32d15766df4be22416a2 |
File details
Details for the file jupyter_datatables-0.2.2-py2.py3-none-any.whl
.
File metadata
- Download URL: jupyter_datatables-0.2.2-py2.py3-none-any.whl
- Upload date:
- Size: 13.0 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8f161d2ef34f12ee702eca6b43c6e345dec11c2d957c0668d0794863c8570eb2 |
|
MD5 | 02c82aba5542becd3240c7888fc8fea2 |
|
BLAKE2b-256 | a920cea1c1cecdda99c042d312d73f1b9ce152b2e3e457681bf5b651b5fca323 |