High Level Expressions for Dask
Project description
Dask Expressions
Dask DataFrames with query optimization.
This is a proof-of-concept rewrite of Dask DataFrame that includes query optimization and generally improved organization.
More in our blog posts:
Example
import dask_expr as dx
df = dx.datasets.timeseries()
df.head()
df.groupby("name").x.mean().compute()
Query Representation
Dask-expr encodes user code in an expression tree:
>>> df.x.mean().pprint()
Mean:
Projection: columns='x'
Timeseries: seed=1896674884
This expression tree will be optimized and modified before execution:
>>> df.x.mean().optimize().pprint()
Div:
Sum:
Fused(375f9):
| Projection: columns='x'
| Timeseries: dtypes={'x': <class 'float'>} seed=1896674884
Count:
Fused(375f9):
| Projection: columns='x'
| Timeseries: dtypes={'x': <class 'float'>} seed=1896674884
Stability
This project is a work in progress and will be changed without notice or deprecation warning. Please provide feedback, but it's best to avoid use in production settings.
API Coverage
dask_expr.DataFrame
abs
add
add_prefix
add_sufix
align
all
any
apply
assign
astype
bfill
clip
combine_first
copy
count
cummax
cummin
cumprod
cumsum
dask
div
divide
drop
drop_duplicates
dropna
dtypes
eval
explode
ffill
fillna
floordiv
groupby
head
idxmax
idxmin
ìloc
index
isin
isna
join
map
map_overlap
map_partitions
mask
max
mean
memory_usage
memory_usage_per_partition
merge
min
min
mod
mode
mul
nlargest
nsmallest
nunique_approx
partitions
pivot_table
pow
prod
query
radd
rdiv
rename
rename_axis
repartition
replace
reset_index
rfloordiv
rmod
rmul
round
rpow
rsub
rtruediv
sample
select_dtypes
set_index
shift
shuffle
sort_values
std
sub
sum
tail
to_parquet
to_timestamp
truediv
var
visualize
where
dask_expr.Series
abs
add
align
all
any
apply
astype
between
bfill
clip
combine_first
copy
count
cummax
cummin
cumprod
cumsum
dask
div
divide
drop_duplicates
dropna
dtype
explode
ffill
fillna
floordiv
groupby
head
idxmax
idxmin
index
isin
isna
map
map_partitions
mask
max
mean
memory_usage
memory_usage_per_partition
min
min
mod
mode
mul
nlargest
nsmallest
nunique_approx
partitions
pow
prod
product
radd
rdiv
rename
rename_axis
repartition
replace
reset_index
rfloordiv
rmod
rmul
round
rpow
rsub
rtruediv
shift
shuffle
std
sub
sum
tail
to_frame
to_timestamp
truediv
unique
value_counts
var
visualize
where
dask_expr.Index
abs
align
all
any
apply
astype
clip
combine_first
copy
count
dask
dtype
fillna
groupby
head
idxmax
idxmin
index
isin
isna
map_partitions
max
memory_usage
min
min
mode
nunique_approx
partitions
prod
rename
rename_axis
repartition
replace
reset_index
round
shuffle
std
sum
tail
to_frame
to_timestamp
var
visualize
dask_expr._groupby.GroupBy
agg
aggregate
apply
- `bfill
count
ffill
first
last
max
mean
median
min
nunique
prod
shift
size
std
sum
transform
value_counts
var
Support for SeriesGroupBy
and DataFrameGroupBy
.
dask_expr._resample.Resampler
agg
count
first
last
max
mean
median
min
nunique
ohlc
prod
quantile
sem
size
std
sum
var
dask_expr._rolling.Rolling
agg
apply
count
max
mean
median
min
quantile
std
sum
var
skew
kurt
Binary operators (DataFrame
, Series
, and Index
):
__add__
__radd__
__sub__
__rsub__
__mul__
__pow__
__rmul__
__truediv__
__rtruediv__
__lt__
__rlt__
__gt__
__rgt__
__le__
__rle__
__ge__
__rge__
__eq__
__ne__
__and__
__rand__
__or__
__ror__
__xor__
__rxor__
Unary operators (DataFrame
, Series
, and Index
):
__invert__
__neg__
__pos__
Accessors:
CategoricalAccessor
DatetimeAccessor
StringAccessor
Function
concat
from_pandas
merge
pivot_table
read_csv
read_parquet
repartition
to_datetime
to_numeric
to_timedelta
to_parquet
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dask-expr-0.3.0.tar.gz
.
File metadata
- Download URL: dask-expr-0.3.0.tar.gz
- Upload date:
- Size: 116.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.12.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fe5d62b076d92d6741ba3532fa9199ef1f815946573423171f36c0f02ea9e8cf |
|
MD5 | 6521f6a130390de12520f843337120c9 |
|
BLAKE2b-256 | 297d1eccf099448a722d054d3714c13e72a9fe846801a4cf617446221739db5c |
Provenance
File details
Details for the file dask_expr-0.3.0-py3-none-any.whl
.
File metadata
- Download URL: dask_expr-0.3.0-py3-none-any.whl
- Upload date:
- Size: 108.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.12.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ad8ff3260b623246cd44bc12c05f025fbe3bf68543bf442cc1da95785293dd06 |
|
MD5 | 066e60d80b44b6169de01eb03dc79fdc |
|
BLAKE2b-256 | 73fe92dd2d5dfb99e235003709e87c6cbffc6c4ea954a8b2486242d9bb918356 |