Skip to main content

Automatically upgrade Polars code to the latest version.

Project description

polars-upgrade

PyPI version

Automatically upgrade your Polars code so it's compatible with future versions.

Installation

Easy:

pip install -U polars-upgrade

Usage (command-line)

Run

polars-upgrade my_project --target-version=0.20.31

from the command line. Replace 0.20.31 and my_project with your Polars version, and the name of your directory.

NOTE: this tool will modify your code! You're advised to stage your files before running it.

Usage (pre-commit hook)

-   repo: https://github.com/MarcoGorelli/polars-upgrade
    rev: 0.3.6  # polars-upgrade version goes here
    hooks:
    -   id: polars-upgrade
        args: [--target-version=0.20.31]  # Polars version goes here

Usage (Jupyter Notebooks)

Install nbqa and then run

nbqa polars_upgrade my_project --target-version=0.20.31

Usage (library)

In a Python script:

from polars_upgrade import rewrite, Settings

src = """\
import polars as pl
df.select(pl.count())
"""
settings = Settings(target_version=(0, 20, 4))
output = rewrite(src, settings=settings)
print(output)

Output:

import polars as pl
df.select(pl.len())

If your snippet does not include import polars or import as pl, then you will also need to provide pl and/or polars to aliases, else polars-upgrade will not perform the rewrite. Example:

from polars_upgrade import rewrite, Settings

src = """\
df.select(pl.count())
"""
settings = Settings(target_version=(0, 20, 4))
output = rewrite(src, settings=settings, aliases={'pl'})
print(output)

Output:

df.select(pl.len())

Supported rewrites

Version 0.18.12+

- pl.avg
+ pl.mean

Version 0.19.0+

- df.groupby_dynamic
+ df.group_by_dynamic
- df.groupby_rolling
+ df.rolling
- df.rolling('ts', period='3d').apply
+ df.rolling('ts', period='3d').map_groups
- pl.col('a').rolling_apply
+ pl.col('a').rolling_map
- pl.col('a').apply
+ pl.col('a').map_elements
- pl.col('a').map
+ pl.col('a').map_batches
- pl.map
+ pl.map_batches
- pl.apply
+ pl.map_groups
- pl.col('a').any(drop_nulls=True)
+ pl.col('a').any(ignore_nulls=True)
- pl.col('a').all(drop_nulls=True)
+ pl.col('a').all(ignore_nulls=True)
- pl.col('a').value_counts(multithreaded=True)
+ pl.col('a').value_counts(parallel=True)

Version 0.19.2+

- pl.col('a').is_not
+ pl.col('a').not_

Version 0.19.3+

- pl.enable_string_cache(True)
+ pl.enable_string_cache()
- pl.enable_string_cache(False)
+ pl.disable_string_cache()
- pl.col('a').list.count_match
+ pl.col('a').list.count_matches
- pl.col('a').is_last
+ pl.col('a').is_last_distinct
- pl.col('a').is_first
+ pl.col('a').is_first_distinct
- pl.col('a').str.strip
+ pl.col('a').str.strip_chars
- pl.col('a').str.lstrip
+ pl.col('a').str.strip_chars_start
- pl.col('a').str.rstrip
+ pl.col('a').str.strip_chars_end
- pl.col('a').str.count_match
+ pl.col('a').str.count_matches
- pl.col("dt").dt.offset_by("1mo_saturating")
+ pl.col("dt").dt.offset_by("1mo")

Version 0.19.4+

- df.group_by_dynamic('ts', every='3d', truncate=True)
+ df.group_by_dynamic('ts', every='3d', label='left')
- df.group_by_dynamic('ts', every='3d', truncate=False)
+ df.group_by_dynamic('ts', every='3d', label='datapoint')

Version 0.19.8+

- pl.col('a').list.lengths
+ pl.col('a').list.len
- pl.col('a').str.lengths
+ pl.col('a').str.len_bytes
- pl.col('a').str.n_chars
+ pl.col('a').str.len_chars

Version 0.19.11+

- pl.col('a').shift(periods=4)
+ pl.col('a').shift(n=4)
- pl.col('a').shift_and_fill(periods=4)
+ pl.col('a').shift_and_fill(n=4)
- pl.col('a').list.shift(periods=4)
+ pl.col('a').list.shift(n=4)
- pl.col('a').map_dict(remapping={1: 2})
+ pl.col('a').map_dict(mapping={1: 2})

Version 0.19.12+

- pl.col('a').keep_name
+ pl.col('a').name.keep
- pl.col('a').suffix
+ pl.col('a').name.suffix
- pl.col('a').prefix
+ pl.col('a').name.prefix
- pl.col('a').map_alias
+ pl.col('a').name.map
- pl.col('a').str.ljust
+ pl.col('a').str.pad_end
- pl.col('a').str.rjust
+ pl.col('a').str.pad_start
- pl.col('a').zfill(alignment=3)
+ pl.col('a').zfill(length=3)
- pl.col('a').ljust(width=3)
+ pl.col('a').ljust(length=3)
- pl.col('a').rjust(width=3)
+ pl.col('a').rjust(length=3)

Version 0.19.13+

- pl.col('a').dt.milliseconds
+ pl.col('a').dt.total_milliseconds
- pl.col('a').dt.microseconds
+ pl.col('a').dt.total_microseconds
- pl.col('a').dt.nanoseconds
+ pl.col('a').dt.total_nanoseconds

(and so on for other units)

Version 0.19.14+

- pl.col('a').list.take
+ pl.col('a').list.gather
- pl.col('a').cumcount
+ pl.col('a').cum_count
- pl.col('a').cummax
+ pl.col('a').cum_max
- pl.col('a').cummin
+ pl.col('a').cum_min
- pl.col('a').cumprod
+ pl.col('a').cum_prod
- pl.col('a').cumsum
+ pl.col('a').cum_sum
- pl.col('a').cumcount
+ pl.col('a').cum_count
- pl.col('a').take
+ pl.col('a').gather
- pl.col('a').take_every
+ pl.col('a').gather_every
- pl.cumsum
+ pl.cum_sum
- pl.cumfold
+ pl.cum_fold
- pl.cumreduce
+ pl.cum_reduce
- pl.cumsum_horizontal
+ pl.cum_sum_horizontal
- pl.col('a').list.take(index=[1, 2])
+ pl.col('a').list.take(indices=[1, 2])
- pl.col('a').str.parse_int(radix=1)
+ pl.col('a').str.parse_int(base=1)

Version 0.19.15+

- pl.col('a').str.json_extract
+ pl.col('a').str.json_decode

Version 0.19.16+

- pl.col('a').map_dict({'a': 'b'})
+ pl.col('a').replace({'a': 'b'}, default=None)
- pl.col('a').map_dict({'a': 'b'}, default='c')
+ pl.col('a').replace({'a': 'b'}, default='c')

Version 0.20.0+

- df.write_database(table_name='foo', if_exists="append")
+ df.write_database(table_name='foo', if_table_exists="append")

Version 0.20.4+

- pl.col('a').where
+ pl.col('a').filter
- pl.count()
+ pl.len()
- df.with_row_count('row_number')
+ df.with_row_index('row_number')
- pl.scan_ndjson(source, row_count_name='foo', row_count_offset=3)
+ pl.scan_ndjson(source, row_index_name='foo', row_index_offset=3)
[...and similarly for `read_csv`, `read_csv_batched`, `scan_csv`, `read_ipc`, `read_ipc_stream`, `scan_ipc`, `read_parquet`, `scan_parquet`]

Version 0.20.5+

- df.pivot(index=index, values=values, columns=columns, aggregate_function='count')
+ df.pivot(index=index, values=values, columns=columns, aggregate_function='len')

Version 0.20.6+

- pl.read_excel(source, xlsx2csv_options=options, read_csv_options=read_options)
+ pl.read_excel(source, engine_options=options, read_options=read_options)

Version 0.20.7+

- pl.threadpool_size
+ pl.thread_pool_size

Version 0.20.8+

- df.pivot(a, b, c)
+ df.pivot(values=a, index=b, columns=c)

Version 0.20.11+

- pl.col('a').meta.write_json
+ pl.col('a').meta.serialize

Version 0.20.14+

- df.group_by_dynamic('time', every='2d', by='symbol')
+ df.group_by_dynamic('time', every='2d', group_by='symbol')
- df.rolling('time', period='2d', by='symbol')
+ df.rolling('time', period='2d', group_by='symbol')
- df.upsample('time', every='2d', by='symbol')
+ df.upsample('time', every='2d', group_by='symbol')

Version 0.20.17+

- pl.from_repr(tbl=data)
+ pl.from_repr(data=data)

Version 0.20.24+

- pl.col('a').rolling_min('2d', by='time')
+ pl.col('a').rolling_min_by(window_size='2d', by='time')
- pl.col('a').rolling_max('2d', by='time')
+ pl.col('a').rolling_max_by(window_size='2d', by='time')
- pl.col('a').rolling_mean('2d', by='time')
+ pl.col('a').rolling_mean_by(window_size='2d', by='time')
- pl.col('a').rolling_std('2d', by='time')
+ pl.col('a').rolling_std_by(window_size='2d', by='time')
- pl.col('a').rolling_var('2d', by='time')
+ pl.col('a').rolling_var_by(window_size='2d', by='time')
- pl.col('a').rolling_prod('2d', by='time')
+ pl.col('a').rolling_prod_by(window_size='2d', by='time')
- pl.col('a').rolling_sum('2d', by='time')
+ pl.col('a').rolling_sum_by(window_size='2d', by='time')

Version 0.20.29+

- df.join(df_right, how='outer')
+ df.join(df_right, how='full')
- df.join(df_right, how='outer_coalesce')
+ df.join(df_right, how='full', coalesce=True)

Version 0.20.31+

- pl.read_csv(file, dtypes=schema)
+ pl.read_csv(file, schema=schema)
- pl.SQLContext(eager_execution=True)
+ pl.SQLContext(eager=True)
- pl.col('a').top_k(k=2, maintain_order=True)
+ pl.col('a').top_k(k=2)

Notes

This work is derivative of pyupgrade - many parts have been lifted verbatim. As required, I've included pyupgrade's license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_upgrade-0.3.6.tar.gz (24.5 kB view details)

Uploaded Source

Built Distribution

polars_upgrade-0.3.6-py3-none-any.whl (37.6 kB view details)

Uploaded Python 3

File details

Details for the file polars_upgrade-0.3.6.tar.gz.

File metadata

  • Download URL: polars_upgrade-0.3.6.tar.gz
  • Upload date:
  • Size: 24.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for polars_upgrade-0.3.6.tar.gz
Algorithm Hash digest
SHA256 9148baa9c04d3835cfaad2cdf71e7a9fa11c72a60cf346ee7a152861216e379b
MD5 3da02e5a22d26933cce24616cd70a83f
BLAKE2b-256 a68d0b5ba21af1a15d1264d9869337aa0f1338f8d0588d79955dc5c1ee8bf4a9

See more details on using hashes here.

Provenance

File details

Details for the file polars_upgrade-0.3.6-py3-none-any.whl.

File metadata

File hashes

Hashes for polars_upgrade-0.3.6-py3-none-any.whl
Algorithm Hash digest
SHA256 095cb56a6ff83314a0e87858bdd8afd76cdd38ffbdee975625c86a06ad547567
MD5 1ddab59406b94508bbe215136288ee80
BLAKE2b-256 7ea1ad79d37a45c73d2201d94cd94f25d5039b593a27990d44ef43722127c6a8

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page