Skip to main content

Jupyter Notebook extension to levarage pandas DataFrames by integrating DataTables JS.

Project description

Jupyter DataTables

Jupyter Notebook extension to levarage pandas DataFrames by integrating DataTables JS.


About

Data scientists and in fact many developers work with pd.DataFrame on daily basis to interpret data to process them. In my typical workflow. The common workflow is to display the dataframe, take a look at the data schema and then produce multiple plots to check the distribution of the data to have a clearer picture, perhaps search some data in the table, etc...

What if those distribution plots were part of the standard DataFrame and we had the ability to quickly search through the table with minimal effort? What if it was the default representation?

The jupyter-datatables uses jupyter-require to draw the table.


Installation

pip install jupyter-datatables

And enable the required extensions

jupyter nbextension install --sys-prefix --py jupyter_require
jupyter nbextension enable jupyter-require/extension

Usage

import numpy as np
import pandas as pd

from jupyter_datatables import init_datatables_mode

init_datatables_mode()

That's it, your default pandas representation will now use Jupyter DataTables!

df = pd.DataFrame(np.abs(np.random.randn(50, 6)), columns=list(string.ascii_uppercase[:6]))

Jupyter Datatables table representation


In most cases, you don't need to worry too much about the size of your data. Jupyter DataTables calculates required sample size based on a confidence interval (by default this would be 0.95) and margin of error and ceils it to the highest 'smart' value.

For example, for a data containing 100,000 samples, given 0.975 confidence interval and 0.02 margin of error, the Jupyter DataTables would calculate that 3044 samples are required and it would round it up to 4000.

Jupyter Datatables long table sample size

With additional note:

Sample size: 4,000 out of 100,000


We can also handle wide tables with ease.

df = pd.DataFrame(np.abs(np.random.randn(50, 20)), columns=list(string.ascii_uppercase[:20]))

Jupyter Datatables wide table representation


As per 0.2.0, there is a support for multiple dtypes like object, categorical and datetime.

dft = pd.DataFrame({'A': np.random.rand(5),
                    'B': [1, 1, 3, 2, 1],
                    'C': 'foo',
                    'C_': 'This is a very long sentence that should automatically be trimmed',
                    'D': [
                        pd.Timestamp('20010101'), pd.Timestamp('20010102'),
                        pd.Timestamp('20010103'), pd.Timestamp('20010103'),
                        pd.Timestamp('20010104')
                    ],
                    'E': pd.Series([1.0] * 5).astype('float32'),
                    'F': [False, True, False, False, True],
                    'G': pd.Series([1] * 5, dtype='int8')}
                  )

Jupyter Datatables multiple dtypes representation



The future plans:

  • allow custom operations on the table:

    • edit column name
    • edit column type
  • handle multi index

  • handle nested data

  • improve plotting:

    • performance and efficiency
    • customizable
    • resizable
    • dockable
    • draggable to a Jupyter cell (??)
  • [stretch goal] increased performance and space efficiency by server-side processing -- lazy loading


Author: Marek Cermak macermak@redhat.com, @AICoE - Project Thoth

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jupyter-datatables-0.2.2.tar.gz (3.8 MB view details)

Uploaded Source

Built Distribution

jupyter_datatables-0.2.2-py2.py3-none-any.whl (13.0 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file jupyter-datatables-0.2.2.tar.gz.

File metadata

  • Download URL: jupyter-datatables-0.2.2.tar.gz
  • Upload date:
  • Size: 3.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8

File hashes

Hashes for jupyter-datatables-0.2.2.tar.gz
Algorithm Hash digest
SHA256 2b0abcb7fbf8f5a7408e7759a51a6e09505108a8a37da570561eaa8c01eaed71
MD5 bdb346d624b1f4d2cb49d39f729632e7
BLAKE2b-256 768bc3513c088e618f48e141c6dd2937cc20afdeccdb32d15766df4be22416a2

See more details on using hashes here.

File details

Details for the file jupyter_datatables-0.2.2-py2.py3-none-any.whl.

File metadata

  • Download URL: jupyter_datatables-0.2.2-py2.py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8

File hashes

Hashes for jupyter_datatables-0.2.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 8f161d2ef34f12ee702eca6b43c6e345dec11c2d957c0668d0794863c8570eb2
MD5 02c82aba5542becd3240c7888fc8fea2
BLAKE2b-256 a920cea1c1cecdda99c042d312d73f1b9ce152b2e3e457681bf5b651b5fca323

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page