Web Client for Visualizing Pandas Objects
Project description
What is it?
D-Tale is the combination of a Flask back-end and a React front-end to bring you an easy way to view & analyze Pandas data structures. It integrates seamlessly with ipython notebooks & python/ipython terminals. Currently this tool supports such Pandas objects as DataFrame, Series, MultiIndex, DatetimeIndex & RangeIndex.
Origins
D-Tale was the product of a SAS to Python conversion. What was originally a perl script wrapper on top of SAS’s insight function is now a lightweight web client on top of Pandas data structures.
In The News
Man Institute (warning: contains deprecated functionality)
Tutorials
## Related Resources
Contents
-
-
-
XArray Operations, Describe, Outlier Detection, Custom Filter, Dataframe Functions, Merge & Stack, Summarize Data, Duplicates, Missing Analysis, Correlations, Predictive Power Score, Heat Map, Highlight Dtypes, Highlight Missing, Highlight Outliers, Highlight Range, Low Variance Flag, Instances, Code Exports, Export CSV, Load Data & Sample Datasets, Refresh Widths, About, Theme, Reload Data, Unpin/Pin Menu, Language, Shutdown
Where To get It
The source code is currently hosted on GitHub at: https://github.com/man-group/dtale
Binary installers for the latest released version are available at the Python package index and on conda using conda-forge.
# conda
conda install dtale -c conda-forge
# if you want to also use "Export to PNG" for charts
conda install -c plotly python-kaleido
# or PyPI
pip install dtale
Getting Started
PyCharm |
jupyter |
---|---|
Python Terminal
This comes courtesy of PyCharm Feel free to invoke python or ipython directly and use the commands in the screenshot above and it should work
Issues With Windows Firewall
If you run into issues with viewing D-Tale in your browser on Windows please try making Python public under “Allowed Apps” in your Firewall configuration. Here is a nice article: How to Allow Apps to Communicate Through the Windows Firewall
Additional functions available programmatically
import dtale
import pandas as pd
df = pd.DataFrame([dict(a=1,b=2,c=3)])
# Assigning a reference to a running D-Tale process
d = dtale.show(df)
# Accessing data associated with D-Tale process
tmp = d.data.copy()
tmp['d'] = 4
# Altering data associated with D-Tale process
# FYI: this will clear any front-end settings you have at the time for this process (filter, sorts, formatting)
d.data = tmp
# Shutting down D-Tale process
d.kill()
# using Python's `webbrowser` package it will try and open your server's default browser to this process
d.open_browser()
# There is also some helpful metadata about the process
d._data_id # the process's data identifier
d._url # the url to access the process
d2 = dtale.get_instance(d._data_id) # returns a new reference to the instance running at that data_id
dtale.instances() # prints a list of all ids & urls of running D-Tale sessions
Duplicate data check
To help guard against users loading the same data to D-Tale multiple times and thus eating up precious memory, we have a loose check for duplicate input data. The check runs the following: * Are row & column count the same as a previously loaded piece of data? * Are the names and order of columns the same as a previously loaded piece of data?
If both these conditions are true then you will be presented with an error and a link to the previously loaded data. Here is an example of how the interaction looks:
As A Script
D-Tale can be run as script by adding subprocess=False to your dtale.show command. Here is an example script:
import dtale
import pandas as pd
if __name__ == '__main__':
dtale.show(pd.DataFrame([1,2,3,4,5]), subprocess=False)
Jupyter Notebook
Within any jupyter (ipython) notebook executing a cell like this will display a small instance of D-Tale in the output cell. Here are some examples:
dtale.show |
assignment |
instance |
---|---|---|
If you are running ipython<=5.0 then you also have the ability to adjust the size of your output cell for the most recent instance displayed:
One thing of note is that a lot of the modal popups you see in the standard browser version will now open separate browser windows for spacial convienence:
Column Menus |
Correlations |
Describe |
Column Analysis |
Instances |
---|---|---|---|---|
JupyterHub w/ Jupyter Server Proxy
JupyterHub has an extension that allows to proxy port for user, JupyterHub Server Proxy
To me it seems like this extension might be the best solution to getting D-Tale running within kubernetes. Here’s how to use it:
import pandas as pd
import dtale
import dtale.app as dtale_app
dtale_app.JUPYTER_SERVER_PROXY = True
dtale.show(pd.DataFrame([1,2,3]))
Notice the command dtale_app.JUPYTER_SERVER_PROXY = True this will make sure that any D-Tale instance will be served with the jupyter server proxy application root prefix:
/user/{jupyter username}/proxy/{dtale instance port}/
One thing to note is that if you try to look at the _main_url of your D-Tale instance in your notebook it will not include the hostname or port:
import pandas as pd
import dtale
import dtale.app as dtale_app
dtale_app.JUPYTER_SERVER_PROXY = True
d = dtale.show(pd.DataFrame([1,2,3]))
d._main_url # /user/johndoe/proxy/40000/dtale/main/1
This is because it’s very hard to promgramatically figure out the host/port that your notebook is running on. So if you want to look at _main_url please be sure to preface it with:
http[s]://[jupyterhub host]:[jupyterhub port]
If for some reason jupyterhub changes their API so that the application root changes you can also override D-Tale’s application root by using the app_root parameter to the show() function:
import pandas as pd
import dtale
import dtale.app as dtale_app
dtale.show(pd.DataFrame([1,2,3]), app_root='/user/johndoe/proxy/40000/`)
Using this parameter will only apply the application root to that specific instance so you would have to include it on every call to show().
JupyterHub w/ Kubernetes
Please read this post
Docker Container
If you have D-Tale installed within your docker container please add the following parameters to your docker run command.
On a Mac:
docker run -h `hostname` -p 40000:40000
-h this will allow the hostname (and not the PID of the docker container) to be available when building D-Tale URLs
-p access to port 40000 which is the default port for running D-Tale
On Windows:
docker run -p 40000:40000
-p access to port 40000 which is the default port for running D-Tale
D-Tale URL will be http://127.0.0.1:40000/
Everything Else:
docker run -h `hostname` --network host
-h this will allow the hostname (and not the PID of the docker container) to be available when building D-Tale URLs
--network host this will allow access to as many ports as needed for running D-Tale processes
Google Colab
This is a hosted notebook site and thanks to Colab’s internal function google.colab.output.eval_js & the JS function google.colab.kernel.proexyPort users can run D-Tale within their notebooks.
DISCLAIMER: It is important that you set USE_COLAB to true when using D-Tale within this service. Here is an example:
import pandas as pd
import dtale
import dtale.app as dtale_app
dtale_app.USE_COLAB = True
dtale.show(pd.DataFrame([1,2,3]))
If this does not work for you try using USE_NGROK which is described in the next section.
Kaggle
This is yet another hosted notebook site and thanks to the work of flask_ngrok users can run D-Tale within their notebooks.
DISCLAIMER: It is import that you set USE_NGROK to true when using D-Tale within this service. Here is an example:
import pandas as pd
import dtale
import dtale.app as dtale_app
dtale_app.USE_NGROK = True
dtale.show(pd.DataFrame([1,2,3]))
Here are some video tutorials of each:
Service |
Tutorial |
Addtl Notes |
---|---|---|
Google Colab |
||
Kaggle |
make sure you switch the “Internet” toggle to “On” under settings of your notebook so you can install the egg from pip |
It is important to note that using NGROK will limit you to 20 connections per mintue so if you see this error:
Wait a little while and it should allow you to do work again. I am actively working on finding a more sustainable solution similar to what I did for google colab. :pray:
Binder
I have built a repo which shows an example of how to run D-Tale within Binder here.
The important take-aways are: * you must have jupyter-server-proxy installed * look at the environment.yml file to see how to add it to your environment * look at the postBuild file for how to activate it on startup
R with Reticulate
I was able to get D-Tale running in R using reticulate. Here is an example:
library('reticulate') dtale <- import('dtale') df <- read.csv('https://vincentarelbundock.github.io/Rdatasets/csv/boot/acme.csv') dtale$show(df, subprocess=FALSE, open_browser=TRUE)
Now the problem with doing this is that D-Tale is not running as a subprocess so it will block your R console and you’ll lose out the following functions: - manipulating the state of your data from your R console - adding more data to D-Tale
open_browser=TRUE isn’t required and won’t work if you don’t have a default browser installed on your machine. If you don’t use that parameter simply copy & paste the URL that gets printed to your console in the browser of your choice.
I’m going to do some more digging on why R doesn’t seem to like using python subprocesses (not sure if it something with how reticulate manages the state of python) and post any findings to this thread.
Here’s some helpful links for getting setup:
reticulate
installing python packages
Startup with No Data
It is now possible to run D-Tale with no data loaded up front. So simply call dtale.show() and this will start the application for you and when you go to view it you will be presented with a screen where you can upload either a CSV or TSV file for data.
Once you’ve loaded a file it will take you directly to the standard data grid comprised of the data from the file you loaded. This might make it easier to use this as an on demand application within a container management system like kubernetes. You start and stop these on demand and you’ll be presented with a new instance to load any CSV or TSV file to!
Command-line
Base CLI options (run dtale --help to see all options available)
Prop |
Description |
---|---|
--host |
the name of the host you would like to use (most likely not needed since socket.gethostname() should figure this out) |
--port |
the port you would like to assign to your D-Tale instance |
--name |
an optional name you can assign to your D-Tale instance (this will be displayed in the <title> & Instances popup) |
--debug |
turn on Flask’s “debug” mode for your D-Tale instance |
--no-reaper |
flag to turn off auto-reaping subprocess (kill D-Tale instances after an hour of inactivity), good for long-running displays |
--open-browser |
flag to automatically open up your server’s default browser to your D-Tale instance |
--force |
flag to force D-Tale to try an kill any pre-existing process at the port you’ve specified so it can use it |
Loading data from arctic(high performance datastore for pandas dataframes) (this requires either installing arctic or dtale[arctic])
dtale --arctic-host mongodb://localhost:27027 --arctic-library jdoe.my_lib --arctic-node my_node --arctic-start 20130101 --arctic-end 20161231
Loading data from CSV
dtale --csv-path /home/jdoe/my_csv.csv --csv-parse_dates date
Loading data from EXCEL
dtale --excel-path /home/jdoe/my_csv.xlsx --excel-parse_dates date
dtale --excel-path /home/jdoe/my_csv.xls --excel-parse_dates date
Loading data from JSON
dtale --json-path /home/jdoe/my_json.json --json-parse_dates date
or
dtale --json-path http://json-endpoint --json-parse_dates date
Loading data from R Datasets
dtale --r-path /home/jdoe/my_dataset.rda
Loading data from SQLite DB Files
dtale --sqlite-path /home/jdoe/test.sqlite3 --sqlite-table test_table
Custom Command-line Loaders
Loading data from a Custom loader - Using the DTALE_CLI_LOADERS environment variable, specify a path to a location containing some python modules - Any python module containing the global variables LOADER_KEY & LOADER_PROPS will be picked up as a custom loader - LOADER_KEY: the key that will be associated with your loader. By default you are given arctic & csv (if you use one of these are your key it will override these) - LOADER_PROPS: the individual props available to be specified. - For example, with arctic we have host, library, node, start & end. - If you leave this property as an empty list your loader will be treated as a flag. For example, instead of using all the arctic properties we would simply specify --arctic (this wouldn’t work well in arctic’s case since it depends on all those properties) - You will also need to specify a function with the following signature def find_loader(kwargs) which returns a function that returns a dataframe or None - Here is an example of a custom loader:
from dtale.cli.clickutils import get_loader_options
'''
IMPORTANT!!! This global variable is required for building any customized CLI loader.
When find loaders on startup it will search for any modules containing the global variable LOADER_KEY.
'''
LOADER_KEY = 'testdata'
LOADER_PROPS = ['rows', 'columns']
def test_data(rows, columns):
import pandas as pd
import numpy as np
import random
from past.utils import old_div
from pandas.tseries.offsets import Day
from dtale.utils import dict_merge
import string
now = pd.Timestamp(pd.Timestamp('now').date())
dates = pd.date_range(now - Day(364), now)
num_of_securities = max(old_div(rows, len(dates)), 1) # always have at least one security
securities = [
dict(security_id=100000 + sec_id, int_val=random.randint(1, 100000000000),
str_val=random.choice(string.ascii_letters) * 5)
for sec_id in range(num_of_securities)
]
data = pd.concat([
pd.DataFrame([dict_merge(dict(date=date), sd) for sd in securities])
for date in dates
], ignore_index=True)[['date', 'security_id', 'int_val', 'str_val']]
col_names = ['Col{}'.format(c) for c in range(columns)]
return pd.concat([data, pd.DataFrame(np.random.randn(len(data), columns), columns=col_names)], axis=1)
# IMPORTANT!!! This function is required for building any customized CLI loader.
def find_loader(kwargs):
test_data_opts = get_loader_options(LOADER_KEY, LOADER_PROPS, kwargs)
if len([f for f in test_data_opts.values() if f]):
def _testdata_loader():
return test_data(int(test_data_opts.get('rows', 1000500)), int(test_data_opts.get('columns', 96)))
return _testdata_loader
return None
In this example we simplying building a dataframe with some dummy data based on dimensions specified on the command-line: - --testdata-rows - --testdata-columns
Here’s how you would use this loader:
DTALE_CLI_LOADERS=./path_to_loaders bash -c 'dtale --testdata-rows 10 --testdata-columns 5'
Authentication
You can choose to use optional authentication by adding the following to your D-Tale .ini file (directions here):
[auth]
active = True
username = johndoe
password = 1337h4xOr
Or you can call the following:
import dtale.global_state as global_state
global_state.set_auth_settings({'active': True, 'username': 'johndoe', 'password': '1337h4x0r'})
If you have done this before initially starting D-Tale it will have authentication applied. If you are adding this after starting D-Tale you will have to kill your service and start it over.
When opening your D-Tale session you will be presented with a screen like this:
From there you can enter the credentials you either set in your .ini file or in your call to dtale.global_state.set_auth_settings and you will be brought to the main grid as normal. You will now have an additional option in your main menu to logout:
Instance Settings
Users can set front-end properties on their instances programmatically in the dtale.show function or by calling the update_settings function on their instance. For example:
import dtale
import pandas as pd
df = pd.DataFrame(dict(
a=[1,2,3,4,5],
b=[6,7,8,9,10],
c=['a','b','c','d','e']
))
dtale.show(
df,
locked=['c'],
column_formats={'a': {'fmt': '0.0000'}},
nan_display='...',
background_mode='heatmap-col',
sort=[('a','DESC')],
vertical_headers=True,
)
or
import dtale
import pandas as pd
df = pd.DataFrame(dict(
a=[1,2,3,4,5],
b=[6,7,8,9,10],
c=['a','b','c','d','e']
))
d = dtale.show(
df
)
d.update_settings(
locked=['c'],
column_formats={'a': {'fmt': '0.0000'}},
nan_display='...',
background_mode='heatmap-col',
sort=[('a','DESC')],
vertical_headers=True,
)
d
Here’s a short description of each instance setting available:
show_columns
A list of column names you would like displayed in your grid. Anything else will be hidden.
hide_columns
A list of column names you would like initially hidden from the grid display.
column_formats
A dictionary of column name keys and their front-end display configuration. Here are examples of the different format configurations: * Numeric: {'fmt': '0.00000'} * String: * {'fmt': {'truncate': 10}} truncate string values to no more than 10 characters followed by an ellipses * {'fmt': {'link': True}} if your strings are URLs convert them to clickable links * {'fmt': {'html': True}} if your strings are HTML fragments render them as HTML * Date: {'fmt': 'MMMM Do YYYY, h:mm:ss a'} uses Moment.js formatting
nan_display
Converts any nan values in your dataframe to this when it is sent to the browser (doesn’t actually change the state of your dataframe)
sort
List of tuples which sort your dataframe (EX: [('a', 'ASC'), ('b', 'DESC')])
locked
List of column names which will be locked to the right side of your grid while you scroll to the left.
background_mode
A string denoting one of the many background displays available in D-Tale. Options are: * heatmap-all: turn on heatmap for all numeric columns where the colors are determined by the range of values over all numeric columns combined * heatmap-col: turn on heatmap for all numeric columns where the colors are determined by the range of values in the column * heatmap-col-[column name]: turn on heatmap highlighting for a specific column * dtypes: highlight columns based on it’s data type * missing: highlight any missing values (np.nan, empty strings, strings of all spaces) * outliers: highlight any outliers * range: highlight values for any matchers entered in the “range_highlights” option * lowVariance: highlight values with a low variance
range_highlights
Dictionary of column name keys and range configurations which if the value for that column exists then it will be shaded that color. Here is an example input:
'a': { 'active': True, 'equals': {'active': True, 'value': 3, 'color': {'r': 255, 'g': 245, 'b': 157, 'a': 1}}, # light yellow 'greaterThan': {'active': True, 'value': 3, 'color': {'r': 80, 'g': 227, 'b': 194, 'a': 1}}, # mint green 'lessThan': {'active': True, 'value': 3, 'color': {'r': 245, 'g': 166, 'b': 35, 'a': 1}}, # orange }
vertical_headers
If set to True then the headers in your grid will be rotated 90 degrees vertically to conserve width.
Predefined Filters
Users can build their own custom filters which can be used from the front-end using the following code snippet:
import pandas as pd
import dtale
import dtale.predefined_filters as predefined_filters
import dtale.global_state as global_state
global_state.set_app_settings(dict(open_predefined_filters_on_startup=True))
predefined_filters.set_filters([
{
"name": "A and B > 2",
"column": "A",
"description": "Filter A with B greater than 2",
"handler": lambda df, val: df[(df["A"] == val) & (df["B"] > 2)],
"input_type": "input",
"default": 1,
"active": False,
},
{
"name": "A and (B % 2) == 0",
"column": "A",
"description": "Filter A with B mod 2 equals zero (is even)",
"handler": lambda df, val: df[(df["A"] == val) & (df["B"] % 2 == 0)],
"input_type": "select",
"default": 1,
"active": False,
},
{
"name": "A in values and (B % 2) == 0",
"column": "A",
"description": "A is within a group of values and B mod 2 equals zero (is even)",
"handler": lambda df, val: df[df["A"].isin(val) & (df["B"] % 2 == 0)],
"input_type": "multiselect",
"default": [1],
"active": True,
}
])
df = pd.DataFrame(
([[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18]]),
columns=['A', 'B', 'C', 'D', 'E', 'F']
)
dtale.show(df)
This code illustrates the types of inputs you can have on the front end: * input: just a simple text input box which users can enter any value they want (if the value specified for "column" is an int or float it will try to convert the string to that data type) and it will be passed to the handler * select: this creates a dropdown populated with the unique values of "column" (an asynchronous dropdown if the column has a large amount of unique values) * multiselect: same as “select” but it will allow you to choose multiple values (handy if you want to perform an isin operation in your filter)
Here is a demo of the functionality:
If there are any new types of inputs you would like available please don’t hesitate to submit a request on the “Issues” page of the repo.
Using Swifter
Swifter is a package which will increase performance on any apply() function on a pandas series or dataframe. If install the package in your virtual environment
pip install swifter
# or
pip install dtale[swifter]
It will be used for the following operations: - Standard dataframe formatting in the main grid & chart display - Column Builders - Type Conversions - string hex -> int or float - int or float -> hex - mixed -> boolean - int -> timestamp - date -> int - Similarity Distance Calculation - Handling of empty strings when calculating missing counts - Building unique values by data type in “Describe” popup
Accessing CLI Loaders in Notebook or Console
I am pleased to announce that all CLI loaders will be available within notebooks & consoles. Here are some examples (the last working if you’ve installed dtale[arctic]): - dtale.show_csv(path='test.csv', parse_dates=['date']) - dtale.show_csv(path='http://csv-endpoint', index_col=0) - dtale.show_excel(path='test.xlsx', parse_dates=['date']) - dtale.show_excel(path='test.xls', sheet=) - dtale.show_excel(path='http://excel-endpoint', index_col=0) - dtale.show_json(path='http://json-endpoint', parse_dates=['date']) - dtale.show_json(path='test.json', parse_dates=['date']) - dtale.show_r(path='text.rda') - dtale.show_arctic(host='host', library='library', node='node', start_date='20200101', end_date='20200101')
UI
Once you have kicked off your D-Tale session please copy & paste the link on the last line of output in your browser
Header
The header gives users an idea of what operations have taken place on your data (sorts, filters, hidden columns). These values will be persisted across broswer instances. So if you perform one of these operations and then send a link to one of your colleagues they will see the same thing :)
Notice the “X” icon on the right of each display. Clicking this will remove those operations.
When performing multiple of the same operation the description will become too large to display so the display will truncate the description and if users click it they will be presented with a tooltip where you can crop individual operations. Here are some examples:
Sorts |
Filters |
Hidden Columns |
---|---|---|
Resize Columns
Currently there are two ways which you can resize columns. * Dragging the right border of the column’s header cell.
Altering the “Maximum Column Width” property from the ribbon menu.
Side Note: You can also set the max_column_width property ahead of time in your global configuration or programmatically using:
import dtale.global_state as global_state
global_state.set_app_settings(dict(max_column_width=100))
Editing Cells
You may edit any cells in your grid (with the exception of the row indexes or headers, the ladder can be edited using the Rename column menu function).
In order to edit a cell simply double-click on it. This will convert it into a text-input field and you should see a blinking cursor. In addition to turning that cell into an input it will also display an input at the top of the screen for better viewing of long strings. It is assumed that the value you type in will match the data type of the column you editing. For example:
integers -> should be a valid positive or negative integer
float -> should be a valid positive or negative float
string -> any valid string will do
category -> either a pre-existing category or this will create a new category for (so beware!)
date, timestamp, timedelta -> should be valid string versions of each
boolean -> any string you input will be converted to lowercase and if it equals “true” then it will make the cell True, otherwise False
Users can make use of two protected values as well:
“nan” -> numpy.nan
“inf” -> numpy.inf
To save your change simply press “Enter” or to cancel your changes press “Esc”.
If there is a conversion issue with the value you have entered it will display a popup with the specific exception in question.
Here’s a quick demo:
Here’s a demo of editing cells with long strings:
Copy Cells Into Clipboard
Select |
Copy |
Paste |
---|---|---|
One request that I have heard time and time again while working on D-Tale is “it would be great to be able to copy a range of cells into excel”. Well here is how that is accomplished: 1) Shift + Click on a cell 2) Shift + Click on another cell (this will trigger a popup) 3) Choose whether you want to include headers in your copy by clicking the checkbox 4) Click Yes 5) Go to your excel workbook and execute Ctrl + V or manually choose “Paste” * You can also paste this into a standard text editor and what you’re left with is tab-delimited data
OFFLINE CHARTS
Want to run D-Tale in a jupyter notebook and build a chart that will still be displayed even after your D-Tale process has shutdown? Now you can! Here’s an example code snippet show how to use it:
import dtale def test_data(): import random import pandas as pd import numpy as np df = pd.DataFrame([ dict(x=i, y=i % 2) for i in range(30) ]) rand_data = pd.DataFrame(np.random.randn(len(df), 5), columns=['z{}'.format(j) for j in range(5)]) return pd.concat([df, rand_data], axis=1) d = dtale.show(test_data()) d.offline_chart(chart_type='bar', x='x', y='z3', agg='sum')
Pro Tip: If generating offline charts in jupyter notebooks and you run out of memory please add the following to your command-line when starting jupyter
--NotebookApp.iopub_data_rate_limit=1.0e10
Disclaimer: Long Running Chart Requests
If you choose to build a chart that requires a lot of computational resources then it will take some time to run. Based on the way Flask & plotly/dash interact this will block you from performing any other request until it completes. There are two courses of action in this situation:
Restart your jupyter notebook kernel or python console
Open a new D-Tale session on a different port than the current session. You can do that with the following command: dtale.show(df, port=[any open port], force=True)
If you miss the legacy (non-plotly/dash) charts, not to worry! They are still available from the link in the upper-right corner, but on for a limited time… Here is the documentation for those: Legacy Charts
Your Feedback is Valuable
This is a very powerful feature with many more features that could be offered (linked subplots, different statistical aggregations, etc…) so please submit issues :)
Network Viewer
This tool gives users the ability to visualize directed graphs. For the screenshots I’ll beshowing for this functionality we’ll be working off a dataframe with the following data:
Start by selecting columns containing the “To” and “From” values for the nodes in you network and then click “Load”:
You can also see instructions on to interact with the network by expanding the directions section by clicking on the header “Network Viewer” at the top. You can also view details about the network provided by the package networkx by clicking the header “Network Analysis”.
Select a column containing weighting for the edges of the nodes in the “Weight” column and click “Load”:
Select a column containing group information for each node in the “From” column by populating “Group” and then clicking “Load”:
Perform shortest path analysis by doing a Shift+Click on two nodes:
View direct descendants of each node by clicking on it:
You can zoom in on nodes by double-clicking and zoom back out by pressing “Esc”.
Correlations
Shows a pearson correlation matrix of all numeric columns against all other numeric columns - By default, it will show a grid of pearson correlations (filtering available by using drop-down see 2nd table of screenshots) - If you have a date-type column, you can click an individual cell and see a timeseries of pearson correlations for that column combination - Currently if you have multiple date-type columns you will have the ability to toggle between them by way of a drop-down - Furthermore, you can click on individual points in the timeseries to view the scatter plot of the points going into that correlation - Within the scatter plot section you can also view the details of the PPS for those data points in the chart by hovering over the number next to “PPS”
Matrix |
PPS |
Timeseries |
Scatter |
---|---|---|---|
Col1 Filtered |
Col2 Filtered |
Col1 & Col2 Filtered |
---|---|---|
When the data being viewed in D-Tale has date or timestamp columns but for each date/timestamp vlaue there is only one row of data the behavior of the Correlations popup is a little different - Instead of a timeseries correlation chart the user is given a rolling correlation chart which can have the window (default: 10) altered - The scatter chart will be created when a user clicks on a point in the rollign correlation chart. The data displayed in the scatter will be for the ranges of dates involved in the rolling correlation for that date.
Data |
Correlations |
---|---|
Predictive Power Score
Predictive Power Score (using the package ppscore) is an asymmetric, data-type-agnostic score that can detect linear or non-linear relationships between two columns. The score ranges from 0 (no predictive power) to 1 (perfect predictive power). It can be used as an alternative to the correlation (matrix). WARNING: This could take a while to load.
This page works similar to the Correlations page but uses the PPS calcuation to populate the grid and by clicking on cells you can view the details of the PPS for those two columns in question.
|
Heat Map
This will hide any non-float or non-int columns (with the exception of the index on the right) and apply a color to the background of each cell.
Each float is renormalized to be a value between 0 and 1.0
You have two options for the renormalization
By Col: each value is calculated based on the min/max of its column
Overall: each value is caluclated by the overall min/max of all the non-hidden float/int columns in the dataset
Each renormalized value is passed to a color scale of red(0) - yellow(0.5) - green(1.0)
Turn off Heat Map by clicking menu option you previously selected one more time
Highlight Dtypes
This is a quick way to check and see if your data has been categorized correctly. By clicking this menu option it will assign a specific background color to each column of a specific data type.
category |
timedelta |
float |
int |
date |
string |
bool |
---|---|---|---|---|---|---|
purple |
orange |
green |
light blue |
pink |
white |
yellow |
Highlight Missing
Any cells which contain nan values will be highlighted in yellow.
Any string column cells which are empty strings or strings consisting only of spaces will be highlighted in orange.
❗will be prepended to any column header which contains missing values.
Highlight Outliers
Highlight any cells for numeric columns which surpass the upper or lower bounds of a custom outlier computation. * Lower bounds outliers will be on a red scale, where the darker reds will be near the maximum value for the column. * Upper bounds outliers will be on a blue scale, where the darker blues will be closer to the minimum value for the column. * ⭐ will be prepended to any column header which contains outliers.
Highlight Range
Highlight any range of numeric cells based on three different criteria: * equals * greater than * less than
You can activate as many of these criteria as you’d like nad they will be treated as an “or” expression. For example, (x == 0) or (x < -1) or (x > 1)
Selections |
Output |
---|---|
Low Variance Flag
Show flags on column headers where both these conditions are true: * Count of unique values / column size < 10% * Count of most common value / Count of second most common value > 20
Here’s an example of what this will look like when you apply it: |
Code Exports
Code Exports are small snippets of code representing the current state of the grid you’re viewing including things like: - columns built - filtering - sorting
Other code exports available are: - Describe (Column Analysis) - Correlations (grid, timeseries chart & scatter chart) - Charts built using the Chart Builder
Type |
Code Export |
---|---|
Main Grid |
|
Histogram |
|
Describe |
|
Correlation Grid |
|
Correlation Timeseries |
|
Correlation Scatter |
|
Charts |
Export CSV
Export your current data to either a CSV or TSV file:
Load Data & Sample Datasets
So either when starting D-Tale with no pre-loaded data or after you’ve already loaded some data you now have the ability to load data or choose from some sample datasets directly from the GUI:
Here’s the options at you disposal: * Load a CSV/TSV file by dragging a file to the dropzone in the top or select a file by clicking the dropzone * Load a CSV/TSV or JSON directly from the web by entering a URL (also throw in a proxy if you are using one) * Choose from one of our sample datasets: * US COVID-19 data from NY Times (updated daily) * Script breakdowns of popular shows Seinfeld & The Simpsons * Movie dataset containing release date, director, actors, box office, reviews… * Video games and their sales * pandas.util.testing.makeTimeDataFrame
Instances
This will give you information about other D-Tale instances are running under your current Python process.
For example, if you ran the following script:
import pandas as pd
import dtale
dtale.show(pd.DataFrame([dict(foo=1, bar=2, biz=3, baz=4, snoopy_D_O_double_gizzle=5)]))
dtale.show(pd.DataFrame([
dict(a=1, b=2, c=3, d=4),
dict(a=2, b=3, c=4, d=5),
dict(a=3, b=4, c=5, d=6),
dict(a=4, b=5, c=6, d=7)
]))
dtale.show(pd.DataFrame([range(6), range(6), range(6), range(6), range(6), range(6)]), name="foo")
This will make the Instances button available in all 3 of these D-Tale instances. Clicking that button while in the first instance invoked above will give you this popup:
The grid above contains the following information: - Process: timestamp when the process was started along with the name (if specified in dtale.show()) - Rows: number of rows - Columns: number of columns - Column Names: comma-separated string of column names (only first 30 characters, hover for full listing) - Preview: this button is available any of the non-current instances. Clicking this will bring up left-most 5X5 grid information for that instance - The row highlighted in green signifys the current D-Tale instance - Any other row can be clicked to switch to that D-Tale instance
Here is an example of clicking the “Preview” button:
About
This will give you information about what version of D-Tale you’re running as well as if its out of date to whats on PyPi.
Up To Date |
Out Of Date |
---|---|
Refresh Widths
Mostly a fail-safe in the event that your columns are no longer lining up. Click this and should fix that
Theme
Toggle between light & dark themes for your viewing pleasure (only affects grid, not popups or charts).
Light |
Dark |
---|---|
Reload Data
Force a reload of the data from the server for the current rows being viewing in the grid by clicking this button. This can be helpful when viewing the grid from within another application like jupyter or nested within another website.
Language
I am happy to announce that D-Tale now supports both English & Chinese (there is still more of the translation to be completed but the infrastructure is there). And we are happy to add support for any other languages. Please see instruction on how, here.
Shutdown
Pretty self-explanatory, kills your D-Tale session (there is also an auto-kill process that will kill your D-Tale after an hour of inactivity)
Hotkeys
These are key combinations you can use in place of clicking actual buttons to save a little time:
Keymap |
Action |
---|---|
shift+m |
Opens main menu* |
shift+d |
Opens “Describe” page* |
shift+f |
Opens “Custom Filter”* |
shift+b |
Opens “Build Column”* |
shift+c |
Opens “Charts” page* |
shift+x |
Opens “Code Export”* |
esc |
Closes any open modal window or side panel & exits cell editing |
* Does not fire if user is actively editing a cell.
For Developers
Cloning
Clone the code (git clone ssh://git@github.com:manahl/dtale.git), then start the backend server:
$ git clone ssh://git@github.com:manahl/dtale.git
# install the dependencies
$ python setup.py develop
# start the server
$ dtale --csv-path /home/jdoe/my_csv.csv --csv-parse_dates date
You can also run dtale from PyDev directly.
You will also want to import javascript dependencies and build the source (all javascript code resides in the frontend folder):
$ cd frontend
$ npm install
# 1) a persistent server that serves the latest JS:
$ npm run watch
# 2) or one-off build:
$ npm run build
Running tests
The usual npm test command works:
$ npm test
You can run individual test files:
$ npm run test -- static/__tests__/dtale/DataViewer-base-test.jsx
Linting
You can lint all the JS and CSS to confirm there’s nothing obviously wrong with it:
$ npm run lint
You can also lint individual JS files:
$ npm run lint-js-file -s -- static/dtale/DataViewer.jsx
Formatting JS
You can auto-format code as follows:
$ npm run format
Docker Development
You can build python 27-3 & run D-Tale as follows:
$ yarn run build
$ docker-compose build dtale_2_7
$ docker run -it --network host dtale_2_7:latest
$ python
>>> import pandas as pd
>>> df = pd.DataFrame([dict(a=1,b=2,c=3)])
>>> import dtale
>>> dtale.show(df)
Then view your D-Tale instance in your browser using the link that gets printed
You can build python 36-1 & run D-Tale as follows:
$ yarn run build
$ docker-compose build dtale_3_6
$ docker run -it --network host dtale_3_6:latest
$ python
>>> import pandas as pd
>>> df = pd.DataFrame([dict(a=1,b=2,c=3)])
>>> import dtale
>>> dtale.show(df)
Then view your D-Tale instance in your browser using the link that gets printed
Adding Language Support
Currently D-Tale support both english & chinese but other languages will gladly be supported. To add another language simply open a pull request with the following: - cake a copy & translate the values in the following JSON english JSON files and save them to the same locations as each file - Back-End - Front-End - please make the name of these files the name of the language you are adding (currently english -> en, chinese -> cn) - be sure to keep the keys in english, that is important
Looking forward to what languages come next! :smile:
Global State/Data Storage
If D-Tale is running in an environment with multiple python processes (ex: on a web server running gunicorn) it will most likely encounter issues with inconsistent state. Developers can fix this by configuring the system D-Tale uses for storing data. Detailed documentation is available here: Data Storage and managing Global State
Startup Behavior
Here’s a little background on how the dtale.show() function works: - by default it will look for ports between 40000 & 49000, but you can change that range by specifying the environment variables DTALE_MIN_PORT & DTALE_MAX_PORT - think of sessions as python consoles or jupyter notebooks
Session 1 executes dtale.show(df) our state is:
Session |
Port |
Active Data IDs |
URL(s) |
---|---|---|---|
1 |
40000 |
1 |
Session 1 executes dtale.show(df) our state is:
Session |
Port |
Active Data IDs |
URL(s) |
---|---|---|---|
1 |
40000 |
1,2 |
Session 2 executes dtale.show(df) our state is:
Session |
Port |
Active Data IDs |
URL(s) |
---|---|---|---|
1 |
40000 |
1,2 |
|
2 |
40001 |
1 |
Session 1 executes dtale.show(df, port=40001, force=True) our state is:
Session |
Port |
Active Data IDs |
URL(s) |
---|---|---|---|
1 |
40001 |
1,2,3 |
Session 3 executes dtale.show(df) our state is:
Session |
Port |
Active Data IDs |
URL(s) |
---|---|---|---|
1 |
40001 |
1,2,3 |
|
3 |
40000 |
1 |
Session 2 executes dtale.show(df) our state is:
Session |
Port |
Active Data IDs |
URL(s) |
---|---|---|---|
1 |
40001 |
1,2,3 |
|
3 |
40000 |
1 |
|
2 |
40002 |
1 |
Session 4 executes dtale.show(df, port=8080) our state is:
Session |
Port |
Active Data IDs |
URL(s) |
---|---|---|---|
1 |
40001 |
1,2,3 |
|
3 |
40000 |
1 |
|
2 |
40002 |
1 |
|
4 |
8080 |
1 |
Session 1 executes dtale.get_instance(1).kill() our state is:
Session |
Port |
Active Data IDs |
URL(s) |
---|---|---|---|
1 |
40001 |
2,3 |
|
3 |
40000 |
1 |
|
2 |
40002 |
1 |
|
4 |
8080 |
1 |
Session 5 sets DTALE_MIN_RANGE to 30000 and DTALE_MAX_RANGE 39000 and executes dtale.show(df) our state is:
Session |
Port |
Active Data ID(s) |
URL(s) |
---|---|---|---|
1 |
40001 |
2,3 |
|
3 |
40000 |
1 |
|
2 |
40002 |
1 |
|
4 |
8080 |
1 |
|
5 |
30000 |
1 |
Documentation
Have a look at the detailed documentation.
Dependencies
Back-end
dash
dash_daq
Flask
Flask-Compress
flask-ngrok
Pandas
plotly
scikit-learn
scipy
xarray
arctic [extra]
redis [extra]
rpy2 [extra]
Front-end
react-virtualized
chart.js
Acknowledgements
D-Tale has been under active development at Man Numeric since 2019.
Original concept and implementation: Andrew Schonfeld
Contributors:
Mike Kelly
Youssef Habchi - title font
… and many others …
Contributions welcome!
License
D-Tale is licensed under the GNU LGPL v2.1. A copy of which is included in LICENSE
Changelog
2.3.0 (2022-5-3)
Added the ability to export correlations or PPS heatmaps to PNG
#653: added “version” entry to dtale module
2.2.0 (2022-3-23)
2.1.2 (2022-3-15)
2.1.0 (2022-3-13)
#617: HTML Exports of data grid
Added option for JSONL line files in json-loader
#643: updated how selected columns get passed to correlation scatter charts
#642: updates for merge screen
#641: fixed histogram label precision
#614: display D-Tale by name
#612: fixed bug with replacing strings
#602: update any date columns to have naive timezone
#607: display of chinese characters in missingno plots
#606: stringify tuple column names
2.0.0 (2022-2-20)
Typescript conversion of frontend code
1.61.1 (2021-11-17)
1.61.0 (2021-11-15)
1.60.2 (2021-11-3)
#594: fix for editing cells while using redislite
1.60.1 (2021-10-31)
updates for “Time Series Analysis” with aggregation
1.60.0 (2021-10-31)
1.59.1 (2021-10-15)
#583: allow for “vertical_headers” to be set from dtale.show
1.59.0 (2021-10-15)
1.58.3 (2021-10-4)
updated dash-bio to an optional dependency
1.58.2 (2021-10-3)
fix for background_mode in dtale.show
1.58.1 (2021-10-2)
re-pinned dash to 2.0.0
1.58.0 (2021-10-2)
1.57.0 (2021-9-22)
#565: allow “chart per group” display in scatter charts
#564: geometric mean aggregation in “Summarize Data”
#559: lock columns from config, highlight rows, move filters to custom filter, nan display
#560: Add “Gage R&R” computation
#558: added “Filtered” toggle to “Variance Report”
#561: Modify behaviour for finding free port
1.56.0 (2021-8-31)
1.55.0 (2021-8-17)
1.54.1 (2021-8-11)
#549: fix for grouping charts by multiple columns
1.54.0 (2021-8-6)
1.53.0 (2021-7-28)
1.52.0 (2021-7-10)
1.51.0 (2021-7-5)
1.50.1 (2021-6-24)
#520: additional code export update
1.50.0 (2021-6-23)
#515: adding dataframe.index() to chart axis
#520: wrong indent in chart code export
#519: display raw HTML
#518: cumulative sum builder
#517: keep less correlated columns
#514: targeted histogram fixes
#493: Correlations grid sorting
#505: Filtering enhancements
#484: renamed “Percentage Count” to “Count (Percentage)”
#503: add separate option for “Clean Column” to main menu
1.49.0 (2021-6-9)
bump css-what from 5.0.0 to 5.0.1
added the ability to toggle the display of all columns when heatmap is turned on
#491: overlapping histogram chart
bump ws from 7.4.5 to 7.4.6
Updates
1.48.0 (2021-5-28)
#504: fix for toggling between unique row & word values
#502: updated “cleaning” column builder to allow for inplace updates
#501: updates to describe window labels
#500: cleaning “Remove Numbers” code snippet fix
#488: string encoding for correlations
#484: fixed for percentage count chart aggregation
Correlation Scatter Updates:
#480: flexible branding
#485: Adjustable height on Correlation grid
1.47.0 (2021-5-21)
#477: Excel-style cell editing at top of screen
UI input for “Maximum Column Width”
JS package upgrades
refactored how sphinx documentation is built
1.46.0 (2021-5-11)
1.45.0 (2021-5-5)
1.44.1 (2021-4-27)
#470: editing cells for column names with special characters
1.44.0 (2021-4-26)
1.43.0 (2021-4-18)
1.42.1 (2021-4-12)
added ESC button handler for closing the side panel
1.42.0 (2021-4-11)
Added missingno chart display
added new side panel for viewing describe data
updated how requirements files are loaded in setup.py
added cleanup function to instance object
added animation for display of hidden/filter/sort info row
#306: ribbon menu
1.41.1 (2021-3-30)
1.41.0 (2021-3-26)
1.40.2 (2021-3-21)
#454: fixed issue with parenthesis & percent symbols in column names
1.40.1 (2021-3-16)
hotfix for chart code exports of category column analysis
1.40.0 (2021-3-16)
moved “Open In New Tab” button
#135: refactored column analysis code and updated code exports to include plotly charts
1.39.0 (2021-3-14)
resizable columns
updated how click loader options are found
Added loader for r datasets (*.rda files)
updating the language menu option to list options dynamically
1.38.0 (2021-3-10)
#452: handling of column names with periods & spaces as well as long names
updated styling of windows to match that of Charts
#448: set default value of “ignore_duplicate” to True
#442: Dash Updates
Split charts by y-axis values if there are multiple
Saving charts off and building new ones
Toggling which piece of data you’re viewing
Toggling language nav menu
Instances popup changes
updated preview to use DataPreview
updated display of “memory usage” to numeral.js
1.37.1 (2021-3-6)
Updated MANIFEST.in to include requirements.txt
1.37.0 (2021-3-5)
#445: updated URL paths to handle when D-Tale is running with jupyter server proxy
#315: Internationalization (supports english & chinese currently)
#441: Add option to ‘pin’ the menu to the screen as a fixed side panel
-
updated scatter plot date header to be generated server-side
updated scatter plot generation in correlations to use date index rather than date value for filtering
update setup.py to load dependencies from requirements.txt
#437: optional memory usage optimization and show mem usage
1.36.0 (2021-2-18)
1.35.0 (2021-2-14)
#261: Merging & Stacking UI
1.34.0 (2021-2-7)
1.33.1 (2021-2-1)
1.33.0 (2021-1-31)
Excel Uploads
Removed python2.7 support from code
CI Updates:
updated JS workflow to use latest node image
dropped support for python 2.7 and added support for python 3.9
Jest test refactoring
#415: single column heatmap
#414: exporting charts using “top_bars”
#413: Q-Q Plot
#411: updates for column analysis warnings
#412: histogram for date columns
#404: fixes for group input display on floats and data frequencies
1.32.1 (2021-1-25)
1.32.0 (2021-1-24)
#396: added kurtosis to date column descriptions and fixed issue with sequential diffs hanging around for previous columns
#397: group type & bin type (frequency/width) options for charts
Updated pandas query building to use backticks for extreme column names
Node tooltips and URL history building for Network Viewer
#399: better titles for groups in charts
#393: rolling & exponential smoothing column builders
#401: option to show top values in bar charts
1.31.0 (2021-1-16)
#387: calculate skew on date columns converted to millisecond integers
#386: bugfixes with “Rows w/ numeric” & “Rows w/ hidden”
#389: added more precision to KDE values
update Network Viewer to allow for URL parameter passing of to, from, group & weight
#343: buttons to load sequential diffs for different sorts
#376: added bins option to charts for float column groupings
#345: geolocation analysis
#370: toggle to turn off auto-loading of charts
#330: data slope column builder
additional documentation
1.30.0 (2021-1-3)
1.29.1 (2020-12-24)
#228: additional documentation on how to run in docker
#344: Updates to sorting of unique values as well as display of word value count raw values
#374: fixed issue displaying “NaN” string values in chart group options
#373: only use group values in mapbox if mapbox group column(s) has been specified
#367: rows with hidden characters
#372: updated labels for First/Last aggregations and added “Remove Duplicates” option
#368: updated “No Aggregation” to be default aggregationfor charts
#369: x-axis count wordclouds
#366: additional hyphen added to “Replace Hyphens w/ Space” cleaner
#365: fixed display issues with KDE
1.29.0 (2020-12-22)
#363: show/hide columns on load
#348: sub-date map animation fix
#347: display items loaded in “Load” slider
#349: additional duplicates handling in chart builders
node-notifier depdabot alert
#351: added KDE to histograms in column analysis
package upgrades
#350: x-axis column selection no longer required for charts
if there is no selection then the default index of (1, 2, …, N) will be used in its place
#356: “replace hyphens” cleaner and cleaners added to “Value Counts” analysis
#358: addition skew/kurtosis display
#357: cleaner for hidden characters
#359: repositioned skew/kurt in describe
#359: moved “Variance Report” option up in column menu
#360: updates to string describe labels
fixed issues with draggable/resizable modals
1.28.1 (2020-12-16)
updated modals to be resizable (re-resizable)
1.28.0 (2020-12-14)
#354: fix for building data ids greater than 10
#343: remove nan & nat values from sequential diff analysis
#342: column cleaner descriptions
#340: add column cleaners to “Word Value Counts” analysis chart
#341: NLTK stopword cleaner updates
#338: removing nan values from string metrics
#334: skew/kurtosis summary
Updated modals to be movable (react-draggable)
build(deps): bump ini from 1.3.5 to 1.3.7
Notify iframe parent of updates
1.27.0 (2020-12-9)
1.26.0 (2020-12-5)
1.25.0 (2020-11-30)
1.24.0 (2020-11-23)
#295: check for swifter when executing apply functions
Reworked the display of the “Instances” popup
fixed issue with serving static assets when using “app_root”
1.23.0 (2020-11-21)
Added better handling for open_browser
#319: fix for loading xarray dimensions
Added support for embedding D-Tale within Streamlit
1.22.1 (2020-11-15)
additional updates to how int/float hex conversions work
1.22.0 (2020-11-14)
1.21.1 (2020-11-8)
Additional fixes for #313 & #302
Handling for partial .ini files
Handling for dictionary inputs w/ non-iterable values
1.21.0 (2020-11-6)
#313: support for numpy.array, lists & dictionaries
#302: configuration file for default options
Removal of legacy charting code & updating flask route to plotly dash charts from /charts to /dtale/charts
Update to how routes are overriden so it will work with gunicorn
Documentation
running within gunicorn
embedding in another Flask or Django app
configuration settings
1.20.0 (2020-11-1)
#311: png chart exports and fix for trandlines in exports
Added the option to switch grid to “Dark Mode”
1.19.2 (2020-10-25)
Documentation updates & better formatting of sample dataset buttons
bugfixes for loading empty dtale in a notebook and chart display in embedded app demo
1.19.1 (2020-10-24)
Load CSV/TSV/JSON from the web as well as some sample datasets
#310: handling for nan in ordinal & label encoders
1.19.0 (2020-10-23)
1.18.2 (2020-10-17)
1.18.1 (2020-10-16)
1.18.0 (2020-10-15)
#282: additional exception handling for overriding routes
#271: standardized column builder
#282: better support for using D-Tale within another Flask application
#270: filter outliers from column menu
allow users to start D-Tale without loading data
#264: similarity column builder
#286: column description lag after type conversion
1.17.0 (2020-10-10)
1.16.0 (2020-10-4)
1.15.2 (2020-9-4)
hotfix to move HIDE_SHUTDOWN & GITHUB_FORK to dtale module
1.15.1 (2020-9-3)
hotfix to expose HIDE_SHUTDOWN & GITHUB_FORK from dtale.global_state
1.15.0 (2020-9-3)
1.14.1 (2020-8-20)
#252: Describe shows proper values, but repeats ‘Total Rows:’ heading instead of proper headings
1.14.0 (2020-8-19)
1.13.0 (2020-8-13)
#231: “Lock Zoom” button on 3D Scatter & Surface charts for locking camera on animations
global & instance-level flag to turn off cell editing
added the ability to upload CSVs
upgraded prismjs
#234: update to line animations so that you can lock axes and highlight last point
#233: add candlestick charts
#241: total counts vs. count (non-nan) in describe
#240: force convert to float
#239: converting mixed columns
#237: updated “Pivot” reshaper to always using pivot_table
#236: “inplace” & “drop_index” parameters for memory optimization and parquet loader
#229: added histogram sample chart to bins column builder
1.12.1 (2020-8-5)
better axis display on heatmaps
handling for column filter data on “mixed” type columns
“title” parameter added for offline charts
heatmap drilldowns on animations
bugfix for refreshing custom geojson charts
1.12.0 (2020-8-1)
added better notification for when users view Category breakdowns in “Column Analysis” & “Describe”
fixed code snippets in “Numeric” column builder when no operation is selected
fixed code exports for transform, winsorixe & z-score normalize column builders
added colorscale option to 3D Scatter charts
added “Animate By” to Heatmaps
initial chart drilldown functionality (histogram, bar)
fixed bug with code exports on transform, winsorize & z-score normalize column builders
updated labeling & tooltips on histogram charts
npm package upgrades
1.11.0 (2020-7-23)
updated column filters so that columns with more than 500 unique values are loaded asynchronously as with AsyncSelect
added code export to Variance report
added z-score normalize column builder
1.10.0 (2020-7-21)
1.9.2 (2020-7-12)
1.9.1 (2020-7-3)
1.9.0 (2020-7-3)
added the ability to build columns using transform
added USE_COLAB for accessing D-Tale within google colab using their proxy
#211: Code export doesnt work on google colab
1.8.19 (2020-6-28)
backwards compatibility of ‘colorscale’ URL parameters in charts
dropping of NaN locations/groups in choropleth maps
1.8.18 (2020-6-28)
1.8.17 (2020-6-18)
#151: allow users to load custom topojson into choropleth maps
1.8.16 (2020-6-7)
#200: support for xarray
1.8.15 (2020-5-31)
#202: maximum recursion errors when using Pyzo IDE
1.8.14 (2020-5-31)
1.8.13 (2020-5-20)
#193: Support for JupyterHub Proxy
1.8.12 (2020-5-15)
#196: dataframes that have datatime indexes without a name
Added the ability to apply formats to all columns of same dtype
1.8.11 (2020-5-3)
#196: improving outlier filter suggestions
#190: hide “Animate” inputs when “Percentage Sum” or “Percentage Count” aggregations are used
#189: hide “Barsort” when grouping is being applied
#187: missing & outlier tooltip descriptions on column headers
#186: close “Describe” tab after clicking “Update Grid”
#122: editable cells
npm package upgrades
circleci build script refactoring
1.8.10 (2020-4-26)
#184: “nan” not showing up for numeric columns
#181: percentage sum/count charts
#179: confirmation for column deletion
#176: highlight background of outliers/missing values
#175: column renaming
#174: moved “Describe” popup to new browser tab
#173: wider column input box for GroupBy in “Summarize Data” popup
#172: allowing groups to be specified in 3D scatter
#170: filter “Value” dropdown for maps to only int or float columns
#164: show information about missing data in “Describe” popup
1.8.9 (2020-4-18)
updated correlations & “Open Popup” to create new tabs instead
test fixes for dash 1.11.0
added python 3.7 & 3.8 support
1.8.8 (2020-4-9)
#144: Changing data type
1.8.7 (2020-4-8)
1.8.6 [hotfix] (2020-4-5)
updates to setup.py to include images
1.8.5 [hotfix] (2020-4-5)
fixed bug with column calculation for map inputs
#149: Icon for Map charts
1.8.4 [hotfix] (2020-4-5)
update to setup.py to include missing static topojson files
#145: Choropleth Map
1.8.3 (2020-4-4)
#143: scattergeo map chart UI changes
updated offline chart generation of maps to work without loading topojson from the web
fix to allow correlations timeseries to handle when date columns jump between rolling & non-rolling
added slider to animation and added animation to maps
fixes for IE 11 compatibility issues
labeling changes for “Reshape” popup
added grouping to maps
1.8.2 (2020-4-1)
#129: show dtype when hovering over header in “Highlight Dtypes” mode and description tooltips added to main menu
made “No Aggregation” the default aggregation in charts
bugfix for line charts with more than 15000 points
updated “Value Counts” & “Category Breakdown” to return top on initial load
#118: added scattergeo & choropleth maps
#121: added “not equal” toggle to filters
#132: updated resize button to “Refresh Widths”
added “Animate” toggle to scatter, line & bar charts
#131: changes to “Reshape Data” window
#130: updates to pivot reshaper
#128: additional hover display of code snippets for column creation
#112: updated “Group” selection to give users the ability to select group values
1.8.1 (2020-3-29)
#92: column builders for random data
#84: highlight columns based on dtype
#111: fix for syntax error in charts code export
#113: updates to “Value Counts” chart in “Column Analysis” for number of values and ordinal entry
#114: export data to CSV/TSV
#116: upodated styling for github fork link so “Code Export” is partially clickable
#119: fixed bug with queries not being passed to functions
#120: fix to allow duplicate x-axis entries in bar charts
added “category breakdown” in column analysis popup for float columns
fixed bug where previous “show missing only” selection was not being recognized
1.8.0 (2020-3-22)
#102: interactive column filtering for string, date, int, float & bool
better handling for y-axis management in charts. Now able to toggle between default, single & multi axis
increased maximum groups to 30 in charts and updated error messaging when it surpasses that for easier filter creation
bugfix for date string width calculation
updated sort/filter/hidden header so that you can now click values which will trigger a tooltip for removing individual values
updated Filter popup to be opened as separate window when needed
1.7.15 (2020-3-9)
1.7.14 (2020-3-7)
Hotfix for “Reshape” popup when forwarding browser to new data instances
1.7.13 (2020-3-7)
New data storage mechanisms available: Redis, Shelve
#100: turned off data limits on charts by using WebGL
#99: graceful handling of issue calculating min/max information for Describe popup
#91: reshaping of data through usage of aggregations, pivots or transposes
Export chart to HTML
Export chart dat to CSV
Offline chart display for use within notebooks
Removal of data from the Instances popup
Updated styling of charts to fit full window dimensions
1.7.12 (2020-3-1)
added syntax highlighting to code exports with react-syntax-highlighting
added arctic integration test
updated Histogram popup to “Column Analysis” which allows for the following
Histograms -> integers and floats
Value Counts -> integers, strings & dates
1.7.11 (2020-2-27)
hotfix for dash custom.js file missing from production webpack build script
1.7.10 (2020-2-27)
#75: added code snippet functionality to the following:
main grid, histogram, correlations, column building & charts
exposed CLI loaders through the following functions dtale.show_csv, dtale.show_json, dtale.show_arctic
build in such a way that it is easy for custom loaders to be exposed as well
#82: pinned future package to be >= 0.14.0
1.7.9 (2020-2-24)
1.7.8 (2020-2-22)
#77: removal of multiprocessed timeouts
1.7.7 (2020-2-22)
centralized global state
1.7.6 (2020-2-21)
allowing the usage of context variables within filters
#64: handling for loading duplicate data to dtale.show
updated dtale.instances() to print urls rather than show all instances
removal of Dash “Export to png” function
passing data grid queries to chart page as default
added sys.exit() to the thread that manages the reaper
1.7.5 (2020-2-20)
hotfix for KeyError loading metadata for columns with min/max information
1.7.4 (2020-2-20)
1.7.3 (2020-2-13)
added the ability to move columns left or right as well as to the front
added formatting capabilities for strings & dates
persist formatting settings to popup on reopening
bugfix for width-calculation on formatting change
1.7.2 (2020-2-12)
60 timeout handling around chart requests
pre-loaded charts through URL search strings
pandas query examples in Filter popup
1.7.1 (2020-2-7)
added pie, 3D scatter & surface charts
updated popups to be displayed when the browser dimensions are too small to host a modal
removed Swagger due to its lack up support for updated dependencies
1.7.0 (2020-1-28)
1.6.10 (2020-1-12)
better front-end handling of dates for charting as to avoid timezone issues
the ability to switch between sorting any axis in bar charts
1.6.9 (2020-1-9)
bugfix for timezone issue around passing date filters to server for scatter charts in correlations popup
1.6.8 (2020-1-9)
additional information about how to use Correlations popup
handling of all-nan data in charts popup
styling issues on popups (especially Histogram)
removed auto-filtering on correlation popup
scatter point color change
added chart icon to cell that has been selected in correlation popup
responsiveness to scatter charts
handling of links to ‘main’,‘iframe’ & ‘popup’ missing data_id
handling of ‘inf’ values when getting min/max & describe data
added header to window popups (correlations, charts, …) and a link back to the grid
added egg building to cirleci script
correlation timeseries chart hover line
1.6.7 (2020-1-3)
#50: updates to rolling correlation functionality
1.6.6 (2020-1-2)
#47: selection of multiple columns for y-axis
updated histogram bin selection to be an input box for full customization
better display of timestamps in axis ticks for charts
sorting of bar charts by y-axis
#48: scatter charts in chart builder
“nunique” added to list of aggregations
turned on “threaded=True” for app.run to avoid hanging popups
#45: rolling computations as aggregations
Y-Axis editor
1.6.5 (2019-12-29)
test whether filters entered will return no data and block the user from apply those
allow for group values of type int or float to be displayed in charts popup
timeseries correlation values which return ‘nan’ will be replaced by zero for chart purposes
update ‘distribution’ to ‘series’ on charts so that missing dates will not show up as ticks
added “fork on github” flag for demo version & links to github/docs on “About” popup
limited lz4 to <= 2.2.1 in python 27-3 since latest version is no longer supported
1.6.4 (2019-12-26)
testing of hostname returned by socket.gethostname, use ‘localhost’ if it fails
removal of flask dev server banner when running in production environments
better handling of long strings in wordclouds
#43: only show timeseries correlations if datetime columns exist with multiple values per date
1.6.3 (2019-12-23)
updated versions of packages in yarn.lock due to issue with chart.js box & whisker plots
1.6.2 (2019-12-23)
#40: loading initial chart as non-line in chart builder
#41: double clicking cells in correlation grid for scatter will cause chart not to display
“Open Popup” button for ipython iframes
column width resizing on sorting
additional int/float descriptors (sum, median, mode, var, sem, skew, kurt)
wordcloud chart type
1.6.1 (2019-12-19)
bugfix for url display when running from command-line
1.6.0 (2019-12-19)
charts integration
the ability to look at data in line, bar, stacked bar & pie charts
the ability to group & aggregate data within the charts
direct ipython iframes to correlations & charts pages with pre-selected inputs
the ability to access instances from code by data id dtale.get_instance(data_id)
view all active data instances dtale.instances()
1.5.1 (2019-12-12)
conversion of new flask instance for each dtale.show call to serving all data associated with one parent process under the same flask instance unless otherwise specified by the user (the force parameter)
1.5.0 (2019-12-02)
ipython integration
ipython output cell adjustment
column-wise menu support
browser window popups for: Correlations, Coverage, Describe, Histogram & Instances
1.4.1 (2019-11-20)
#32: unpin jsonschema by moving flasgger to extras_require
1.4.0 (2019-11-19)
Correlations Pearson Matrix filters
“name” display in title tab
“Heat Map” toggle
dropped unused “Flask-Caching” requirement
1.3.7 (2019-11-12)
1.3.6 (2019-11-08)
Bug fixes for:
choose between pandas.corr & numpy.corrcoef depending on presence of NaNs
hide timeseries correlations when date columns only contain one day
1.3.5 (2019-11-07)
Bug fixes for:
duplicate loading of histogram data
string serialization failing when mixing future.str & str in scatter function
1.3.4 (2019-11-07)
updated correlation calculation to use numpy.corrcoef for performance purposes
github rebranding from manahl -> man-group
1.3.3 (2019-11-05)
hotfix for failing test under certain versions of future package
1.3.2 (2019-11-05)
Bug fixes for:
display of histogram column information
reload of hidden “processes” input when loading instances data
correlations json failures on string conversion
1.3.1 (2019-10-29)
fix for incompatible str types when directly altering state of data in running D-Tale instance
1.3.0 (2019-10-29)
webbrowser integration (the ability to automatically open a webbrowser upon calling dtale.show())
flag for hiding the “Shutdown” button for long-running demos
“Instances” navigator popup for viewing all activate D-Tale instances for the current python process
1.2.0 (2019-10-24)
1.1.1 (2019-10-23)
#13: fix for auto-detection of column widths for strings and floats
1.1.0 (2019-10-08)
IE support
Describe & About popups
Custom CLI support
1.0.0 (2019-09-06)
Initial public release
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file dtale-2.3.0.tar.gz
.
File metadata
- Download URL: dtale-2.3.0.tar.gz
- Upload date:
- Size: 12.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 684be3c665beb43fc4fba4ae90ee17439c4f7d08aa878af52ef6479392089395 |
|
MD5 | 8f82f74d442fb2e008f8f12f4a1eb7b6 |
|
BLAKE2b-256 | ce2e053ca40e8b654d6160abf87a4606cd94c9c077ca9a040b396030d0b166f6 |
File details
Details for the file dtale-2.3.0-py3.9.egg
.
File metadata
- Download URL: dtale-2.3.0-py3.9.egg
- Upload date:
- Size: 13.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4ac105a87c2b908e63a95c27895b5618c99c577bda24ac054c991180b869b25c |
|
MD5 | 616c9c2831703902bf9eb94bee4deb09 |
|
BLAKE2b-256 | ea382ce1c27f3f6917237ede8bbafece5f01ce52f24b9d71a9ed6b307d08c4ff |
File details
Details for the file dtale-2.3.0-py3.8.egg
.
File metadata
- Download URL: dtale-2.3.0-py3.8.egg
- Upload date:
- Size: 13.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d37ddcfa55720b1bc6a9e9a9708604c612ea852b43f876bcbbb273e9b802455f |
|
MD5 | d65f1c2fb94639cb93df97390a1bc179 |
|
BLAKE2b-256 | 6ef5ad03bcb264197a775d7a22754c75cebf08c34fcf67e60e4234a475fc26a5 |
File details
Details for the file dtale-2.3.0-py3.7.egg
.
File metadata
- Download URL: dtale-2.3.0-py3.7.egg
- Upload date:
- Size: 13.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 91b70e33c4f217a9d7917af562d20a20391a30f4853a12438d65dc4b8ac2b7fd |
|
MD5 | dd7bc3cda3e6250ec016c696b52b5ef1 |
|
BLAKE2b-256 | 8b282835e4527530aa42d81b0f426c225fc1a03c0c585f1bc7de30e5086a3d9a |
File details
Details for the file dtale-2.3.0-py3.6.egg
.
File metadata
- Download URL: dtale-2.3.0-py3.6.egg
- Upload date:
- Size: 13.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 82bfc599a8f00875378b1d4d440e0c20f0b6157aadb0dadcd8147c2277acebe8 |
|
MD5 | 1b2e51ea1e970b501d0134a28c4313af |
|
BLAKE2b-256 | f24e002c825008271c779a5fe8bca2ab285c7649d75742b93231d40742fc7217 |
File details
Details for the file dtale-2.3.0-py2.py3-none-any.whl
.
File metadata
- Download URL: dtale-2.3.0-py2.py3-none-any.whl
- Upload date:
- Size: 12.9 MB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3c04469393ed907d83bac56aab55d66893d4e76474798bcb8a593378664ed1f0 |
|
MD5 | 139de373dc7d030ecd2a1a75aa85f8c7 |
|
BLAKE2b-256 | f599bfe1c43d95fcb99b1955887dd4738af232956693e19eeaadad3812d39f3b |
File details
Details for the file dtale-2.3.0-py2.7.egg
.
File metadata
- Download URL: dtale-2.3.0-py2.7.egg
- Upload date:
- Size: 13.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6f524dd5635a72e335e2a121a4559a4382b7bef00ee0b4864e48706d9a2cb1d7 |
|
MD5 | 4e841ac8c78acc6d45665d9b126cab60 |
|
BLAKE2b-256 | 436552a3c3bd54cb5539cf016045b981048f4209a36b6853545b6d8d4084ea15 |