High Level Expressions for Dask
Project description
Dask Expressions
Dask DataFrames with query optimization.
This is a proof-of-concept rewrite of Dask DataFrame that includes query optimization and generally improved organization.
More in our blog posts:
Example
import dask_expr as dx
df = dx.datasets.timeseries()
df.head()
df.groupby("name").x.mean().compute()
Query Representation
Dask-expr encodes user code in an expression tree:
>>> df.x.mean().pprint()
Mean:
Projection: columns='x'
Timeseries: seed=1896674884
This expression tree will be optimized and modified before execution:
>>> df.x.mean().optimize().pprint()
Div:
Sum:
Fused(375f9):
| Projection: columns='x'
| Timeseries: dtypes={'x': <class 'float'>} seed=1896674884
Count:
Fused(375f9):
| Projection: columns='x'
| Timeseries: dtypes={'x': <class 'float'>} seed=1896674884
Stability
This project is a work in progress and will be changed without notice or deprecation warning. Please provide feedback, but it's best to avoid use in production settings.
API Coverage
dask_expr.DataFrame
abs
add_prefix
add_sufix
align
all
any
apply
assign
astype
clip
combine_first
copy
count
dask
drop
drop_duplicates
dropna
dtypes
eval
explode
fillna
groupby
head
idxmax
idxmin
ìloc
index
isin
isna
join
map
map_overlap
map_partitions
mask
max
mean
memory_usage
memory_usage_per_partition
merge
min
min
mode
nlargest
nsmallest
nunique_approx
partitions
pivot_table
prod
query
rename
rename_axis
repartition
replace
reset_index
round
sample
shift
sort_values
select_dtypes
set_index
shuffle
std
sum
tail
to_parquet
to_timestamp
var
visualize
where
dask_expr.Series
abs
align
all
any
apply
astype
between
clip
combine_first
copy
count
dask
drop_duplicates
dropna
dtype
explode
fillna
groupby
head
idxmax
idxmin
index
isin
isna
map
map_partitions
mask
max
mean
memory_usage
memory_usage_per_partition
min
min
mode
nlargest
nsmallest
nunique_approx
partitions
prod
rename
rename_axis
repartition
replace
reset_index
round
shift
shuffle
std
sum
tail
to_frame
to_timestamp
unique
value_counts
var
visualize
where
dask_expr.Index
abs
align
all
any
apply
astype
clip
combine_first
copy
count
dask
dtype
fillna
groupby
head
idxmax
idxmin
index
isin
isna
map_partitions
max
memory_usage
min
min
mode
nunique_approx
partitions
prod
rename
rename_axis
repartition
replace
reset_index
round
shuffle
std
sum
tail
to_frame
to_timestamp
var
visualize
dask_expr._groupby.GroupBy
agg
aggregate
apply
count
first
last
max
mean
median
min
prod
shift
size
std
sum
transform
value_counts
var
dask_expr._resample.Resampler
agg
count
first
last
max
mean
median
min
nunique
ohlc
prod
quantile
sem
size
std
sum
var
dask_expr._rolling.Rolling
agg
apply
count
max
mean
median
min
quantile
std
sum
var
skew
kurt
Binary operators (DataFrame
, Series
, and Index
):
__add__
__radd__
__sub__
__rsub__
__mul__
__rmul__
__truediv__
__rtruediv__
__lt__
__rlt__
__gt__
__rgt__
__le__
__rle__
__ge__
__rge__
__eq__
__ne__
__and__
__rand__
__or__
__ror__
__xor__
__rxor__
Unary operators (DataFrame
, Series
, and Index
):
__invert__
__neg__
__pos__
Accessors:
CategoricalAccessor
DatetimeAccessor
StringAccessor
Function
concat
from_pandas
merge
pivot_table
read_csv
read_parquet
repartition
to_parquet
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for dask_expr-0.2.7-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4690221b061bfdcc33bd3b0ba2349af7f8251a8d90d54e188225ea5f1e4857ce |
|
MD5 | bc83dc8eeb2928c56de2fa33c88aef60 |
|
BLAKE2b-256 | 6214006dac17ef826722da822f250896d7fb61d77b6b84407556409ad89b9282 |