Skip to main content

Automatically upgrade Polars code to the latest version.

Project description

polars-upgrade

Automatically upgrade your Polars code so it's compatible with future versions.

Installation

Easy:

pip install -U polars-upgrade

Usage (command-line)

Run

polars-upgrade my_project --target-version=0.20.31

from the command line. Replace 0.20.31 and my_project with your Polars version, and the name of your directory.

NOTE: this tool will modify your code! You're advised to stage your files before running it.

Usage (pre-commit hook)

-   repo: https://github.com/MarcoGorelli/polars-upgrade
    rev: 0.3.5  # polars-upgrade version goes here
    hooks:
    -   id: polars-upgrade
        args: [--target-version=0.20.31]  # Polars version goes here

Usage (Jupyter Notebooks)

Install nbqa and then run

nbqa polars_upgrade my_project --target-version=0.20.31

Usage (library)

In a Python script:

from polars_upgrade import rewrite, Settings

src = """\
import polars as pl
df.select(pl.count())
"""
settings = Settings(target_version=(0, 20, 4))
output = rewrite(src, settings=settings)
print(output)

Output:

import polars as pl
df.select(pl.len())

If your snippet does not include import polars or import as pl, then you will also need to provide pl and/or polars to aliases, else polars-upgrade will not perform the rewrite. Example:

from polars_upgrade import rewrite, Settings

src = """\
df.select(pl.count())
"""
settings = Settings(target_version=(0, 20, 4))
output = rewrite(src, settings=settings, aliases={'pl'})
print(output)

Output:

df.select(pl.len())

Supported rewrites

Version 0.18.12+

- pl.avg
+ pl.mean

Version 0.19.0+

- df.groupby_dynamic
+ df.group_by_dynamic
- df.groupby_rolling
+ df.rolling
- df.rolling('ts', period='3d').apply
+ df.rolling('ts', period='3d').map_groups
- pl.col('a').rolling_apply
+ pl.col('a').rolling_map
- pl.col('a').apply
+ pl.col('a').map_elements
- pl.col('a').map
+ pl.col('a').map_batches
- pl.map
+ pl.map_batches
- pl.apply
+ pl.map_groups
- pl.col('a').any(drop_nulls=True)
+ pl.col('a').any(ignore_nulls=True)
- pl.col('a').all(drop_nulls=True)
+ pl.col('a').all(ignore_nulls=True)
- pl.col('a').value_counts(multithreaded=True)
+ pl.col('a').value_counts(parallel=True)

Version 0.19.2+

- pl.col('a').is_not
+ pl.col('a').not_

Version 0.19.3+

- pl.enable_string_cache(True)
+ pl.enable_string_cache()
- pl.enable_string_cache(False)
+ pl.disable_string_cache()
- pl.col('a').list.count_match
+ pl.col('a').list.count_matches
- pl.col('a').is_last
+ pl.col('a').is_last_distinct
- pl.col('a').is_first
+ pl.col('a').is_first_distinct
- pl.col('a').str.strip
+ pl.col('a').str.strip_chars
- pl.col('a').str.lstrip
+ pl.col('a').str.strip_chars_start
- pl.col('a').str.rstrip
+ pl.col('a').str.strip_chars_end
- pl.col('a').str.count_match
+ pl.col('a').str.count_matches
- pl.col("dt").dt.offset_by("1mo_saturating")
+ pl.col("dt").dt.offset_by("1mo")

Version 0.19.4+

- df.group_by_dynamic('ts', every='3d', truncate=True)
+ df.group_by_dynamic('ts', every='3d', label='left')
- df.group_by_dynamic('ts', every='3d', truncate=False)
+ df.group_by_dynamic('ts', every='3d', label='datapoint')

Version 0.19.8+

- pl.col('a').list.lengths
+ pl.col('a').list.len
- pl.col('a').str.lengths
+ pl.col('a').str.len_bytes
- pl.col('a').str.n_chars
+ pl.col('a').str.len_chars

Version 0.19.11+

- pl.col('a').shift(periods=4)
+ pl.col('a').shift(n=4)
- pl.col('a').shift_and_fill(periods=4)
+ pl.col('a').shift_and_fill(n=4)
- pl.col('a').list.shift(periods=4)
+ pl.col('a').list.shift(n=4)
- pl.col('a').map_dict(remapping={1: 2})
+ pl.col('a').map_dict(mapping={1: 2})

Version 0.19.12+

- pl.col('a').keep_name
+ pl.col('a').name.keep
- pl.col('a').suffix
+ pl.col('a').name.suffix
- pl.col('a').prefix
+ pl.col('a').name.prefix
- pl.col('a').map_alias
+ pl.col('a').name.map
- pl.col('a').str.ljust
+ pl.col('a').str.pad_end
- pl.col('a').str.rjust
+ pl.col('a').str.pad_start
- pl.col('a').zfill(alignment=3)
+ pl.col('a').zfill(length=3)
- pl.col('a').ljust(width=3)
+ pl.col('a').ljust(length=3)
- pl.col('a').rjust(width=3)
+ pl.col('a').rjust(length=3)

Version 0.19.13+

- pl.col('a').dt.milliseconds
+ pl.col('a').dt.total_milliseconds
- pl.col('a').dt.microseconds
+ pl.col('a').dt.total_microseconds
- pl.col('a').dt.nanoseconds
+ pl.col('a').dt.total_nanoseconds

(and so on for other units)

Version 0.19.14+

- pl.col('a').list.take
+ pl.col('a').list.gather
- pl.col('a').cumcount
+ pl.col('a').cum_count
- pl.col('a').cummax
+ pl.col('a').cum_max
- pl.col('a').cummin
+ pl.col('a').cum_min
- pl.col('a').cumprod
+ pl.col('a').cum_prod
- pl.col('a').cumsum
+ pl.col('a').cum_sum
- pl.col('a').cumcount
+ pl.col('a').cum_count
- pl.col('a').take
+ pl.col('a').gather
- pl.col('a').take_every
+ pl.col('a').gather_every
- pl.cumsum
+ pl.cum_sum
- pl.cumfold
+ pl.cum_fold
- pl.cumreduce
+ pl.cum_reduce
- pl.cumsum_horizontal
+ pl.cum_sum_horizontal
- pl.col('a').list.take(index=[1, 2])
+ pl.col('a').list.take(indices=[1, 2])
- pl.col('a').str.parse_int(radix=1)
+ pl.col('a').str.parse_int(base=1)

Version 0.19.15+

- pl.col('a').str.json_extract
+ pl.col('a').str.json_decode

Version 0.19.16+

- pl.col('a').map_dict({'a': 'b'})
+ pl.col('a').replace({'a': 'b'}, default=None)
- pl.col('a').map_dict({'a': 'b'}, default='c')
+ pl.col('a').replace({'a': 'b'}, default='c')

Version 0.20.0+

- df.write_database(table_name='foo', if_exists="append")
+ df.write_database(table_name='foo', if_table_exists="append")

Version 0.20.4+

- pl.col('a').where
+ pl.col('a').filter
- pl.count()
+ pl.len()
- df.with_row_count('row_number')
+ df.with_row_index('row_number')
- pl.scan_ndjson(source, row_count_name='foo', row_count_offset=3)
+ pl.scan_ndjson(source, row_index_name='foo', row_index_offset=3)
[...and similarly for `read_csv`, `read_csv_batched`, `scan_csv`, `read_ipc`, `read_ipc_stream`, `scan_ipc`, `read_parquet`, `scan_parquet`]

Version 0.20.5+

- df.pivot(index=index, values=values, columns=columns, aggregate_function='count')
+ df.pivot(index=index, values=values, columns=columns, aggregate_function='len')

Version 0.20.6+

- pl.read_excel(source, xlsx2csv_options=options, read_csv_options=read_options)
+ pl.read_excel(source, engine_options=options, read_options=read_options)

Version 0.20.7+

- pl.threadpool_size
+ pl.thread_pool_size

Version 0.20.8+

- df.pivot(a, b, c)
+ df.pivot(values=a, index=b, columns=c)

Version 0.20.11+

- pl.col('a').meta.write_json
+ pl.col('a').meta.serialize

Version 0.20.14+

- df.group_by_dynamic('time', every='2d', by='symbol')
+ df.group_by_dynamic('time', every='2d', group_by='symbol')
- df.rolling('time', period='2d', by='symbol')
+ df.rolling('time', period='2d', group_by='symbol')
- df.upsample('time', every='2d', by='symbol')
+ df.upsample('time', every='2d', group_by='symbol')

Version 0.20.17+

- pl.from_repr(tbl=data)
+ pl.from_repr(data=data)

Version 0.20.24+

- pl.col('a').rolling_min('2d', by='time')
+ pl.col('a').rolling_min_by(window_size='2d', by='time')
- pl.col('a').rolling_max('2d', by='time')
+ pl.col('a').rolling_max_by(window_size='2d', by='time')
- pl.col('a').rolling_mean('2d', by='time')
+ pl.col('a').rolling_mean_by(window_size='2d', by='time')
- pl.col('a').rolling_std('2d', by='time')
+ pl.col('a').rolling_std_by(window_size='2d', by='time')
- pl.col('a').rolling_var('2d', by='time')
+ pl.col('a').rolling_var_by(window_size='2d', by='time')
- pl.col('a').rolling_prod('2d', by='time')
+ pl.col('a').rolling_prod_by(window_size='2d', by='time')
- pl.col('a').rolling_sum('2d', by='time')
+ pl.col('a').rolling_sum_by(window_size='2d', by='time')

Version 0.20.29+

- df.join(df_right, how='outer')
+ df.join(df_right, how='full')
- df.join(df_right, how='outer_coalesce')
+ df.join(df_right, how='full', coalesce=True)

Version 0.20.31+

- pl.read_csv(file, dtypes=schema)
+ pl.read_csv(file, schema=schema)
- pl.SQLContext(eager_execution=True)
+ pl.SQLContext(eager=True)
- pl.col('a').top_k(k=2, maintain_order=True)
+ pl.col('a').top_k(k=2)

Notes

This work is derivative of pyupgrade - many parts have been lifted verbatim. As required, I've included pyupgrade's license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_upgrade-0.3.5.tar.gz (24.5 kB view details)

Uploaded Source

Built Distribution

polars_upgrade-0.3.5-py3-none-any.whl (37.7 kB view details)

Uploaded Python 3

File details

Details for the file polars_upgrade-0.3.5.tar.gz.

File metadata

  • Download URL: polars_upgrade-0.3.5.tar.gz
  • Upload date:
  • Size: 24.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.4

File hashes

Hashes for polars_upgrade-0.3.5.tar.gz
Algorithm Hash digest
SHA256 42bb6109660b496dd71397bea92bee51984dbbdaf3528ccd51315e1a801c78c6
MD5 4365fba895d945d7bf8fa74ab83507a3
BLAKE2b-256 1720eac32388faf3f07e903caeccc6645d992b7355bb4364a1491eb09b8cfbe0

See more details on using hashes here.

Provenance

File details

Details for the file polars_upgrade-0.3.5-py3-none-any.whl.

File metadata

File hashes

Hashes for polars_upgrade-0.3.5-py3-none-any.whl
Algorithm Hash digest
SHA256 516ae9a55ed48d3ea3bc38af578df54dc590326fe77449effab1fcfa55245f7a
MD5 878cc1f75c9eedcf64a7ac9ed08c53e7
BLAKE2b-256 6bf1eb91094096c293f2c82878d1dcf29a845e018e107d70183ad5610f177bcc

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page