Web Client for Visualizing Pandas Objects
Project description
What is it?
D-Tale is the combination of a Flask back-end and a React front-end to bring you an easy way to view & analyze Pandas data structures. It integrates seamlessly with ipython notebooks & python/ipython terminals. Currently this tool supports such Pandas objects as DataFrame, Series, MultiIndex, DatetimeIndex & RangeIndex.
Origins
D-Tale was the product of a SAS to Python conversion. What was originally a perl script wrapper on top of SAS’s insight function is now a lightweight web client on top of Pandas data structures.
Contents
Getting Started
PyCharm |
jupyter |
---|---|
Installing the egg
# install dtale egg (important to use the "--upgrade" every time you install so it will grab the latest version)
$ pip install --upgrade dtale
Now you will have to ability to use D-Tale from the command-line or within a python-enabled terminal
Python Terminal
This comes courtesy of PyCharm Feel free to invoke python or ipython directly and use the commands in the screenshot above and it should work
Additional functions available programatically
import dtale
import pandas as pd
df = pd.DataFrame([dict(a=1,b=2,c=3)])
# Assigning a reference to a running D-Tale process
d = dtale.show(df)
# Accessing data associated with D-Tale process
tmp = d.data.copy()
tmp['d'] = 4
# Altering data associated with D-Tale process
# FYI: this will clear any front-end settings you have at the time for this process (filter, sorts, formatting)
d.data = tmp
# Shutting down D-Tale process
d.kill()
# using Python's `webbrowser` package it will try and open your server's default browser to this process
d.open_browser()
# There is also some helpful metadata about the process
d._data_id # the process's data identifier
d._url # the url to access the process
d2 = dtale.get_instance(d._data_id) # returns a new reference to the instance running at that data_id
dtale.instances() # returns a dictionary of all instances available, this would be { 1: ... }
Jupyter Notebook
Within any jupyter (ipython) notebook executing a cell like this will display a small instance of D-Tale in the output cell. Here are some examples:
dtale.show |
assignment |
instance |
---|---|---|
If you are running ipython<=5.0 then you also have the ability to adjust the size of your output cell for the most recent instance displayed:
One thing of note is that a lot of the modal popups you see in the standard browser version will now open separate browser windows for spacial convienence:
Column Menus |
Correlations |
Describe |
Histogram |
Instances |
---|---|---|---|---|
Command-line
Base CLI options (run dtale --help to see all options available)
Prop |
Description |
---|---|
--host |
the name of the host you would like to use (most likely not needed since socket.gethostname() should figure this out) |
--port |
the port you would like to assign to your D-Tale instance |
--name |
an optional name you can assign to your D-Tale instance (this will be displayed in the <title> & Instances popup) |
--debug |
turn on Flask’s “debug” mode for your D-Tale instance |
--no-reaper |
flag to turn off auto-reaping subprocess (kill D-Tale instances after an hour of inactivity), good for long-running displays |
--open-browser |
flag to automatically open up your server’s default browser to your D-Tale instance |
--force |
flag to force D-Tale to try an kill any pre-existing process at the port you’ve specified so it can use it |
Loading data from arctic(high performance datastore for pandas dataframes)
dtale --arctic-host mongodb://localhost:27027 --arctic-library jdoe.my_lib --arctic-node my_node --arctic-start 20130101 --arctic-end 20161231
Loading data from CSV
dtale --csv-path /home/jdoe/my_csv.csv --csv-parse_dates date
Loading data from a Custom loader - Using the DTALE_CLI_LOADERS environment variable, specify a path to a location containing some python modules - Any python module containing the global variables LOADER_KEY & LOADER_PROPS will be picked up as a custom loader - LOADER_KEY: the key that will be associated with your loader. By default you are given arctic & csv (if you use one of these are your key it will override these) - LOADER_PROPS: the individual props available to be specified. - For example, with arctic we have host, library, node, start & end. - If you leave this property as an empty list your loader will be treated as a flag. For example, instead of using all the arctic properties we would simply specify --arctic (this wouldn’t work well in arctic’s case since it depends on all those properties) - You will also need to specify a function with the following signature def find_loader(kwargs) which returns a function that returns a dataframe or None - Here is an example of a custom loader:
from dtale.cli.clickutils import get_loader_options
'''
IMPORTANT!!! This global variable is required for building any customized CLI loader.
When find loaders on startup it will search for any modules containing the global variable LOADER_KEY.
'''
LOADER_KEY = 'testdata'
LOADER_PROPS = ['rows', 'columns']
def test_data(rows, columns):
import pandas as pd
import numpy as np
import random
from past.utils import old_div
from pandas.tseries.offsets import Day
from dtale.utils import dict_merge
import string
now = pd.Timestamp(pd.Timestamp('now').date())
dates = pd.date_range(now - Day(364), now)
num_of_securities = max(old_div(rows, len(dates)), 1) # always have at least one security
securities = [
dict(security_id=100000 + sec_id, int_val=random.randint(1, 100000000000),
str_val=random.choice(string.ascii_letters) * 5)
for sec_id in range(num_of_securities)
]
data = pd.concat([
pd.DataFrame([dict_merge(dict(date=date), sd) for sd in securities])
for date in dates
], ignore_index=True)[['date', 'security_id', 'int_val', 'str_val']]
col_names = ['Col{}'.format(c) for c in range(columns)]
return pd.concat([data, pd.DataFrame(np.random.randn(len(data), columns), columns=col_names)], axis=1)
# IMPORTANT!!! This function is required for building any customized CLI loader.
def find_loader(kwargs):
test_data_opts = get_loader_options(LOADER_KEY, kwargs)
if len([f for f in test_data_opts.values() if f]):
def _testdata_loader():
return test_data(int(test_data_opts.get('rows', 1000500)), int(test_data_opts.get('columns', 96)))
return _testdata_loader
return None
In this example we simplying building a dataframe with some dummy data based on dimensions specified on the command-line: - --testdata-rows - --testdata-columns
Here’s how you would use this loader:
DTALE_CLI_LOADERS=./path_to_loaders bash -c 'dtale --testdata-rows 10 --testdata-columns 5'
UI
Once you have kicked off your D-Tale session please copy & paste the link on the last line of output in your browser
For Developers
Cloning
Clone the code (git clone ssh://git@github.com:manahl/dtale.git), then start the backend server:
$ git clone ssh://git@github.com:manahl/dtale.git
# install the dependencies
$ python setup.py develop
# start the server
$ python dtale --csv-path /home/jdoe/my_csv.csv --csv-parse_dates date
You can also run dtale from PyDev directly.
You will also want to import javascript dependencies and build the source:
$ npm install
# 1) a persistent server that serves the latest JS:
$ npm run watch
# 2) or one-off build:
$ npm run build
Running tests
The usual npm test command works:
$ npm test
You can run individual test files:
$ TEST=static/__tests__/dtale/DataViewer-base-test.jsx npm run test-file
Linting
You can lint all the JS and CSS to confirm there’s nothing obviously wrong with it:
$ npm run lint -s
You can also lint individual JS files:
$ npm run lint-js-file -s -- static/dtale/DataViewer.jsx
Formatting JS
You can auto-format code as follows:
$ npm run format
Docker Development
You can build python 27-3 & run D-Tale as follows:
$ yarn run build
$ docker-compose build dtale_2_7
$ docker run -it --network host dtale_2_7:latest
$ python
>>> import pandas as pd
>>> df = pd.DataFrame([dict(a=1,b=2,c=3)])
>>> import dtale
>>> dtale.show(df)
Then view your D-Tale instance in your browser using the link that gets printed
You can build python 36-1 & run D-Tale as follows:
$ yarn run build
$ docker-compose build dtale_3_6
$ docker run -it --network host dtale_3_6:latest
$ python
>>> import pandas as pd
>>> df = pd.DataFrame([dict(a=1,b=2,c=3)])
>>> import dtale
>>> dtale.show(df)
Then view your D-Tale instance in your browser using the link that gets printed
Startup Behavior
Here’s a little background on how the dtale.show() function works: - by default it will look for ports between 40000 & 49000, but you can change that range by specifying the environment variables DTALE_MIN_PORT & DTALE_MAX_PORT - think of sessions as python consoles or jupyter notebooks
Session 1 executes dtale.show(df) our state is:
Session |
Port |
Active Data IDs |
URL(s) |
---|---|---|---|
1 |
40000 |
1 |
Session 1 executes dtale.show(df) our state is:
Session |
Port |
Active Data IDs |
URL(s) |
---|---|---|---|
1 |
40000 |
1,2 |
Session 2 executes dtale.show(df) our state is:
Session |
Port |
Active Data IDs |
URL(s) |
---|---|---|---|
1 |
40000 |
1,2 |
|
2 |
40001 |
1 |
Session 1 executes dtale.show(df, port=40001, force=True) our state is:
Session |
Port |
Active Data IDs |
URL(s) |
---|---|---|---|
1 |
40001 |
1,2,3 |
Session 3 executes dtale.show(df) our state is:
Session |
Port |
Active Data IDs |
URL(s) |
---|---|---|---|
1 |
40001 |
1,2,3 |
|
3 |
40000 |
1 |
Session 2 executes dtale.show(df) our state is:
Session |
Port |
Active Data IDs |
URL(s) |
---|---|---|---|
1 |
40001 |
1,2,3 |
|
3 |
40000 |
1 |
|
2 |
40002 |
1 |
Session 4 executes dtale.show(df, port=8080) our state is:
Session |
Port |
Active Data IDs |
URL(s) |
---|---|---|---|
1 |
40001 |
1,2,3 |
|
3 |
40000 |
1 |
|
2 |
40002 |
1 |
|
4 |
8080 |
1 |
Session 1 executes dtale.get_instance(1).kill() our state is:
Session |
Port |
Active Data IDs |
URL(s) |
---|---|---|---|
1 |
40001 |
2,3 |
|
3 |
40000 |
1 |
|
2 |
40002 |
1 |
|
4 |
8080 |
1 |
Session 5 sets DTALE_MIN_RANGE to 30000 and DTALE_MAX_RANGE 39000 and executes dtale.show(df) our state is:
Session |
Port |
Active Data ID(s) |
URL(s) |
---|---|---|---|
1 |
40001 |
2,3 |
|
3 |
40000 |
1 |
|
2 |
40002 |
1 |
|
4 |
8080 |
1 |
|
5 |
30000 |
1 |
Documentation
Have a look at the detailed documentation.
Requirements
D-Tale works with:
Back-end
arctic
Flask
Flask-Caching
Flask-Compress
flasgger
Pandas
scipy
six
Front-end
react-virtualized
chart.js
Acknowledgements
D-Tale has been under active development at Man Numeric since 2019.
Original concept and implementation: Andrew Schonfeld
Contributors:
Mike Kelly
Youssef Habchi - title font
… and many others …
Contributions welcome!
License
D-Tale is licensed under the GNU LGPL v2.1. A copy of which is included in LICENSE
Changelog
1.7.2 (2020-2-12)
60 timeout handling around chart requests
pre-loaded charts through URL search strings
pandas query examples in Filter popup
1.7.1 (2020-2-7)
added pie, 3D scatter & surface charts
updated popups to be displayed when the browser dimensions are too small to host a modal
removed Swagger due to its lack up support for updated dependencies
1.7.0 (2020-1-28)
1.6.10 (2020-1-12)
better front-end handling of dates for charting as to avoid timezone issues
the ability to switch between sorting any axis in bar charts
1.6.9 (2020-1-9)
bugfix for timezone issue around passing date filters to server for scatter charts in correlations popup
1.6.8 (2020-1-9)
additional information about how to use Correlations popup
handling of all-nan data in charts popup
styling issues on popups (especially Histogram)
removed auto-filtering on correlation popup
scatter point color change
added chart icon to cell that has been selected in correlation popup
responsiveness to scatter charts
handling of links to ‘main’,‘iframe’ & ‘popup’ missing data_id
handling of ‘inf’ values when getting min/max & describe data
added header to window popups (correlations, charts, …) and a link back to the grid
added egg building to cirleci script
correlation timeseries chart hover line
1.6.7 (2020-1-3)
#50: updates to rolling correlation functionality
1.6.6 (2020-1-2)
#47: selection of multiple columns for y-axis
updated histogram bin selection to be an input box for full customization
better display of timestamps in axis ticks for charts
sorting of bar charts by y-axis
#48: scatter charts in chart builder
“nunique” added to list of aggregations
turned on “threaded=True” for app.run to avoid hanging popups
#45: rolling computations as aggregations
Y-Axis editor
1.6.5 (2019-12-29)
test whether filters entered will return no data and block the user from apply those
allow for group values of type int or float to be displayed in charts popup
timeseries correlation values which return ‘nan’ will be replaced by zero for chart purposes
update ‘distribution’ to ‘series’ on charts so that missing dates will not show up as ticks
added “fork on github” flag for demo version & links to github/docs on “About” popup
limited lz4 to <= 2.2.1 in python 27-3 since latest version is no longer supported
1.6.4 (2019-12-26)
testing of hostname returned by socket.gethostname, use ‘localhost’ if it fails
removal of flask dev server banner when running in production environments
better handling of long strings in wordclouds
#43: only show timeseries correlations if datetime columns exist with multiple values per date
1.6.3 (2019-12-23)
updated versions of packages in yarn.lock due to issue with chart.js box & whisker plots
1.6.2 (2019-12-23)
#40: loading initial chart as non-line in chart builder
#41: double clicking cells in correlation grid for scatter will cause chart not to display
“Open Popup” button for ipython iframes
column width resizing on sorting
additional int/float descriptors (sum, median, mode, var, sem, skew, kurt)
wordcloud chart type
1.6.1 (2019-12-19)
bugfix for url display when running from command-line
1.6.0 (2019-12-19)
charts integration
the ability to look at data in line, bar, stacked bar & pie charts
the ability to group & aggregate data within the charts
direct ipython iframes to correlations & charts pages with pre-selected inputs
the ability to access instances from code by data id dtale.get_instance(data_id)
view all active data instances dtale.instances()
1.5.1 (2019-12-12)
conversion of new flask instance for each dtale.show call to serving all data associated with one parent process under the same flask instance unless otherwise specified by the user (the force parameter)
1.5.0 (2019-12-02)
ipython integration
ipython output cell adjustment
column-wise menu support
browser window popups for: Correlations, Coverage, Describe, Histogram & Instances
1.4.1 (2019-11-20)
#32: unpin jsonschema by moving flasgger to extras_require
1.4.0 (2019-11-19)
Correlations Pearson Matrix filters
“name” display in title tab
“Heat Map” toggle
dropped unused “Flask-Caching” requirement
1.3.7 (2019-11-12)
1.3.6 (2019-11-08)
Bug fixes for:
choose between pandas.corr & numpy.corrcoef depending on presence of NaNs
hide timeseries correlations when date columns only contain one day
1.3.5 (2019-11-07)
Bug fixes for:
duplicate loading of histogram data
string serialization failing when mixing future.str & str in scatter function
1.3.4 (2019-11-07)
updated correlation calculation to use numpy.corrcoef for performance purposes
github rebranding from manahl -> man-group
1.3.3 (2019-11-05)
hotfix for failing test under certain versions of future package
1.3.2 (2019-11-05)
Bug fixes for:
display of histogram column information
reload of hidden “processes” input when loading instances data
correlations json failures on string conversion
1.3.1 (2019-10-29)
fix for incompatible str types when directly altering state of data in running D-Tale instance
1.3.0 (2019-10-29)
webbrowser integration (the ability to automatically open a webbrowser upon calling dtale.show())
flag for hiding the “Shutdown” button for long-running demos
“Instances” navigator popup for viewing all activate D-Tale instances for the current python process
1.2.0 (2019-10-24)
1.1.1 (2019-10-23)
#13: fix for auto-detection of column widths for strings and floats
1.1.0 (2019-10-08)
IE support
Describe & About popups
Custom CLI support
1.0.0 (2019-09-06)
Initial public release
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file dtale-1.7.2.tar.gz
.
File metadata
- Download URL: dtale-1.7.2.tar.gz
- Upload date:
- Size: 5.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/None requests/2.22.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c809815232cd65844cc3873b2bbe2d592f627e7ad0b5eca39caceea9c102c04d |
|
MD5 | e9a1d9d413739957ae6f4755757cd61e |
|
BLAKE2b-256 | 4e43c3a5b8517c3ee2ff1487f922e76cfb7ae4d599f62c283a7f7b920cd72cb2 |
File details
Details for the file dtale-1.7.2-py3.6.egg
.
File metadata
- Download URL: dtale-1.7.2-py3.6.egg
- Upload date:
- Size: 5.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/None requests/2.22.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fb7fa209c48cbb95bf68dc797421039a33685da61ba1ca17aa4ef1f3bba2bc67 |
|
MD5 | 6d66cfddee49fe2d80f79158fb66b06c |
|
BLAKE2b-256 | eb5fe5b0e008521729ceeef0098dc4f54b47ddf99ba8b8f7ef6bbae2688906dc |
File details
Details for the file dtale-1.7.2-py2.py3-none-any.whl
.
File metadata
- Download URL: dtale-1.7.2-py2.py3-none-any.whl
- Upload date:
- Size: 5.2 MB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/None requests/2.22.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ec4831bc8b370f028ba998309cfe71d59211d75be10574bd64b72de5efc7a799 |
|
MD5 | cdaf8e3b3c1244c6eb72f88fbccc7dac |
|
BLAKE2b-256 | a15d10af3285acffb772e76fe094f50449a5e54e482527569f473ea9f6635464 |
File details
Details for the file dtale-1.7.2-py2.7.egg
.
File metadata
- Download URL: dtale-1.7.2-py2.7.egg
- Upload date:
- Size: 5.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.15.0 pkginfo/None requests/2.22.0 setuptools/28.8.0 requests-toolbelt/0.9.1 tqdm/4.42.0 CPython/2.7.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7830767c417e0020ec9f18bed7af2dcfa707abd076b53903c99918df1ff8a1a0 |
|
MD5 | 8001e09a463acdf5429266e60cd65dee |
|
BLAKE2b-256 | f7624a8094200412fda3953f02221830708b8025d15bf61f80767ada2865112f |