High Level Expressions for Dask

These details have been verified by PyPI

Maintainers

fjetter jrbourbeau phofl rjzamora

These details have not been verified by PyPI

Project links

Source code

Project description

Dask Expressions

Dask DataFrames with query optimization.

This is a proof-of-concept rewrite of Dask DataFrame that includes query optimization and generally improved organization.

Example

import dask_expr as dx

df = dx.datasets.timeseries()
df.head()

df.groupby("name").x.mean().compute()

Query Representation

Dask-expr encodes user code in an expression tree:

>>> df.x.mean().pprint()

Mean:
  Projection: columns='x'
    Timeseries: seed=1896674884

This expression tree will be optimized and modified before execution:

>>> df.x.mean().optimize().pprint()

Div:
  Sum:
    Fused(375f9):
    | Projection: columns='x'
    |   Timeseries: dtypes={'x': <class 'float'>} seed=1896674884
  Count:
    Fused(375f9):
    | Projection: columns='x'
    |   Timeseries: dtypes={'x': <class 'float'>} seed=1896674884

Stability

This project is a work in progress and will be changed without notice or deprecation warning. Please provide feedback, but it's best to avoid use in production settings.

API Coverage

dask_expr.DataFrame

abs
add_prefix
add_sufix
align
all
any
apply
assign
astype
clip
combine_first
copy
count
dask
drop
drop_duplicates
dropna
dtypes
eval
explode
fillna
groupby
head
idxmax
idxmin
ìloc
index
isin
isna
join
map
map_overlap
map_partitions
max
mean
memory_usage
merge
min
min
mode
nlargest
nsmallest
nunique_approx
partitions
pivot_table
prod
query
rename
rename_axis
repartition
replace
reset_index
round
sample
sort_values
select_dtypes
set_index
shuffle
std
sum
tail
to_parquet
to_timestamp
var
visualize

dask_expr.Series

abs
align
all
any
apply
astype
between
clip
combine_first
copy
count
dask
drop_duplicates
dropna
dtype
explode
fillna
groupby
head
idxmax
idxmin
index
isin
isna
map
map_partitions
max
mean
memory_usage
min
min
mode
nlargest
nsmallest
nunique_approx
partitions
prod
rename_axis
repartition
replace
reset_index
round
shuffle
std
sum
tail
to_frame
to_timestamp
unique
value_counts
var
visualize

dask_expr.Index

abs
align
all
any
apply
astype
clip
combine_first
copy
count
dask
dtype
fillna
groupby
head
idxmax
idxmin
index
isin
isna
map_partitions
max
memory_usage
min
min
mode
nunique_approx
partitions
prod
rename_axis
repartition
replace
reset_index
round
shuffle
std
sum
tail
to_frame
to_timestamp
var
visualize

dask_expr._groupby.GroupBy

agg
aggregate
apply
count
first
last
max
mean
min
prod
shift
size
std
sum
transform
value_counts
var

dask_expr._resample.Resampler

agg
count
first
last
max
mean
median
min
nunique
ohlc
prod
quantile
sem
size
std
sum
var

dask_expr._rolling.Rolling

agg
apply
count
max
mean
median
min
quantile
std
sum
var
skew
kurt

Binary operators (DataFrame, Series, and Index):

__add__
__radd__
__sub__
__rsub__
__mul__
__rmul__
__truediv__
__rtruediv__
__lt__
__rlt__
__gt__
__rgt__
__le__
__rle__
__ge__
__rge__
__eq__
__ne__
__and__
__rand__
__or__
__ror__
__xor__
__rxor__

Unary operators (DataFrame, Series, and Index):

__invert__
__neg__
__pos__

Accessors:

CategoricalAccessor
DatetimeAccessor
StringAccessor

Function

concat
from_pandas
merge
pivot_table
read_csv
read_parquet
repartition
to_parquet

Project details

These details have been verified by PyPI

Maintainers

fjetter jrbourbeau phofl rjzamora

These details have not been verified by PyPI

Project links

Source code

Release history Release notifications | RSS feed

1.1.18

Nov 11, 2024

1.1.17 yanked

Nov 8, 2024

Reason this release was yanked:

Critical performance regression

1.1.16

Oct 17, 2024

1.1.15

Sep 28, 2024

1.1.14

Sep 13, 2024

1.1.13

Sep 2, 2024

1.1.12

Aug 30, 2024

1.1.11

Aug 16, 2024

1.1.10

Aug 6, 2024

1.1.9

Jul 20, 2024

1.1.8

Jul 19, 2024

1.1.7

Jul 5, 2024

1.1.6

Jun 21, 2024

1.1.5

Jun 20, 2024

1.1.4

Jun 19, 2024

1.1.3

Jun 14, 2024

1.1.2

May 31, 2024

1.1.1

May 17, 2024

1.1.0

May 3, 2024

1.0.14

Apr 30, 2024

1.0.13

Apr 25, 2024

1.0.12

Apr 19, 2024

1.0.11

Apr 9, 2024

1.0.10

Apr 4, 2024

1.0.9

Apr 2, 2024

1.0.7

Apr 2, 2024

1.0.6

Apr 1, 2024

1.0.5

Mar 22, 2024

1.0.4

Mar 18, 2024

1.0.3

Mar 15, 2024

1.0.2

Mar 14, 2024

1.0.1

Mar 12, 2024

1.0

Mar 12, 2024

0.5.3

Feb 28, 2024

0.5.2

Feb 26, 2024

0.5.1

Feb 23, 2024

0.5.0 yanked

Feb 23, 2024

Reason this release was yanked:

Wrong Dask Version Pin

0.4.2

Feb 12, 2024

0.4.1

Feb 10, 2024

0.4.0

Feb 1, 2024

0.3.5

Jan 18, 2024

0.3.4

Jan 12, 2024

0.3.3

Jan 10, 2024

0.3.2

Jan 5, 2024

0.3.1

Dec 19, 2023

0.3.0

Dec 15, 2023

0.2.9

Dec 12, 2023

0.2.8

Dec 8, 2023

0.2.7

Dec 5, 2023

This version

0.2.6

Dec 1, 2023

0.2.5

Nov 29, 2023

0.2.4

Nov 28, 2023

0.2.3

Nov 22, 2023

0.2.2

Nov 21, 2023

0.2.1

Nov 20, 2023

0.2.0

Nov 20, 2023

0.1.12

Nov 2, 2023

0.1.11

Oct 20, 2023

0.1.10

Oct 17, 2023

0.1.9

Oct 12, 2023

0.1.8

Oct 4, 2023

0.1.7

Sep 25, 2023

0.1.6

Sep 20, 2023

0.1.5

Aug 18, 2023

0.1.4

Aug 12, 2023

0.1.3

Aug 4, 2023

0.1.2

Jul 28, 2023

0.1.1

Jul 21, 2023

0.1.0

Jul 12, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dask-expr-0.2.6.tar.gz (103.8 kB view hashes)

Uploaded Dec 1, 2023 Source

Built Distribution

dask_expr-0.2.6-py3-none-any.whl (93.8 kB view hashes)

Uploaded Dec 1, 2023 Python 3

Hashes for dask-expr-0.2.6.tar.gz

Hashes for dask-expr-0.2.6.tar.gz
Algorithm	Hash digest
SHA256	`0ef2124cd69369bcbb8db8b75ae5b006f477eb6b7d4860c2d5b09cca56311a19`
MD5	`54f39fccba7ae1a86d1e629328cf98f6`
BLAKE2b-256	`adbdf5cb6e10d1421bdc9f7fa5a40786bb5453ee1a5d8b9b654faa74fc9feb18`

Hashes for dask_expr-0.2.6-py3-none-any.whl

Hashes for dask_expr-0.2.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`416a3ee990d7f5c9b0a8a50f89a1270c45bfaab34f54f3102ac01b704d30c837`
MD5	`2cfb0d35d2cad392024c98fb01050b12`
BLAKE2b-256	`b31dc71c358d8ce75232c093abaf025b5855bc75d22abb5c92c2331672c82328`