Skip to main content

Extremely lightweight compatibility layer between pandas, Polars, cuDF, and Modin

Project description

Narwhals

Extremely lightweight compatibility layer between Polars, pandas, cuDF, and Modin.

Seamlessly support all four, without depending on any of them!

  • Just use a subset of the Polars API, no need to learn anything new
  • No dependencies (not even Polars), keep your library lightweight
  • ✅ Separate Lazy and Eager APIs
  • ✅ Use Polars Expressions API

Note: this is work-in-progress, and a bit of an experiment, don't take it too seriously.

Installation

pip install narwhals

Or just vendor it, it's only a bunch of pure-Python files.

Usage

There are three steps to writing dataframe-agnostic code using Narwhals:

  1. use narwhals.to_polars_api to wrap a pandas, Polars, cuDF, or Modin dataframe in the Polars API

  2. use the subset of the Polars API defined in https://github.com/MarcoGorelli/narwhals/blob/main/narwhals/spec/__init__.py.

  3. use narwhals.to_original_object to return an object to the user in their original dataframe flavour. For example:

    • if you started with pandas, you'll get pandas back
    • if you started with Polars, you'll get Polars back
    • if you started with Modin, you'll get Modin back
    • if you started with cuDF, you'll get cuDF back (and computation will happen natively on the GPU!)

Example

Here's an example of a dataframe agnostic function:

from typing import TypeVar
import pandas as pd
import polars as pl

from narwhals import to_polars_api, to_original_object

AnyDataFrame = TypeVar("AnyDataFrame")


def my_agnostic_function(
    suppliers_native: AnyDataFrame,
    parts_native: AnyDataFrame,
) -> AnyDataFrame:
    suppliers, pl = to_polars_api(suppliers_native, version="0.20")
    parts, _ = to_polars_api(parts_native, version="0.20")
    result = (
        suppliers.join(parts, left_on="city", right_on="city")
        .filter(
            pl.col("color").is_in(["Red", "Green"]),
            pl.col("weight") > 14,
        )
        .group_by("s", "p")
        .agg(
            weight_mean=pl.col("weight").mean(),
            weight_max=pl.col("weight").max(),
        )
    )
    return to_original_object(result.collect())

You can pass in a pandas, Polars, cuDF, or Modin dataframe, the output will be the same! Let's try it out:

suppliers = {
    "s": ["S1", "S2", "S3", "S4", "S5"],
    "sname": ["Smith", "Jones", "Blake", "Clark", "Adams"],
    "status": [20, 10, 30, 20, 30],
    "city": ["London", "Paris", "Paris", "London", "Athens"],
}
parts = {
    "p": ["P1", "P2", "P3", "P4", "P5", "P6"],
    "pname": ["Nut", "Bolt", "Screw", "Screw", "Cam", "Cog"],
    "color": ["Red", "Green", "Blue", "Red", "Blue", "Red"],
    "weight": [12.0, 17.0, 17.0, 14.0, 12.0, 19.0],
    "city": ["London", "Paris", "Oslo", "London", "Paris", "London"],
}

print("pandas output:")
print(
    my_agnostic_function(
        pd.DataFrame(suppliers),
        pd.DataFrame(parts),
    )
)
print("\nPolars output:")
print(
    my_agnostic_function(
        pl.LazyFrame(suppliers),
        pl.LazyFrame(parts),
    )
)
pandas output:
    s   p  weight_mean  weight_max
0  S1  P6         19.0        19.0
1  S2  P2         17.0        17.0
2  S3  P2         17.0        17.0
3  S4  P6         19.0        19.0

Polars output:
shape: (4, 4)
┌─────┬─────┬─────────────┬────────────┐
│ s   ┆ p   ┆ weight_mean ┆ weight_max │
│ --- ┆ --- ┆ ---         ┆ ---        │
│ str ┆ str ┆ f64         ┆ f64        │
╞═════╪═════╪═════════════╪════════════╡
│ S1  ┆ P6  ┆ 19.0        ┆ 19.0       │
│ S3  ┆ P2  ┆ 17.0        ┆ 17.0       │
│ S4  ┆ P6  ┆ 19.0        ┆ 19.0       │
│ S2  ┆ P2  ┆ 17.0        ┆ 17.0       │
└─────┴─────┴─────────────┴────────────┘

Magic! 🪄

Scope

If you maintain a dataframe-consuming library, then any function from the Polars API which you'd like to be able to use is in-scope, so long as it can be supported without too much difficulty for at least pandas, cuDF, and Modin.

Feature requests are more than welcome!

Related Projects

  • This is not Ibis. Narwhals lets each backend do its own optimisations, and only provides a lightweight (~30 kilobytes) compatibility layer with the Polars API. Ibis applies its own optimisations to different backends, is a heavyweight dependency (~400 MB), and defines its own API.

  • This is not intended as a DataFrame Standard. See the Consortium for Python Data API Standards for a more general and more ambitious project. Please only consider using Narwhals if you only need to support Polars and pandas-like dataframes, and specifically want to tap into Polars' lazy and expressions features (which are out of scope for the Consortium's Standard).

Why "Narwhals"?

Because they are so awesome.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

narwhals-0.1.7.tar.gz (18.6 kB view details)

Uploaded Source

Built Distribution

narwhals-0.1.7-py3-none-any.whl (17.9 kB view details)

Uploaded Python 3

File details

Details for the file narwhals-0.1.7.tar.gz.

File metadata

  • Download URL: narwhals-0.1.7.tar.gz
  • Upload date:
  • Size: 18.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for narwhals-0.1.7.tar.gz
Algorithm Hash digest
SHA256 7cc8e207b1a1211d77b8468a20379309ba511628efa5762f6bbbe9118af1738f
MD5 85671eaa56c687248955f3f4309cdeae
BLAKE2b-256 19329c6d9858416c23127da3f08ac70c1d2688dec01d2226ce029971bb6e4bfb

See more details on using hashes here.

File details

Details for the file narwhals-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: narwhals-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 17.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for narwhals-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 9faac506ef695ad046795351514554c36e043b1c2463afc1dd854947d3ff7f9d
MD5 d2b7442001206f57029e8a06755cf454
BLAKE2b-256 ce97935d2f3e578590be8bd3613c906b1e6616ce0ac1ed8cb22f0def6486d834

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page