Skip to main content

Extremely lightweight compatibility layer between pandas, Polars, cuDF, and Modin

Project description

Narwhals

Extremely lightweight compatibility layer between Polars, pandas, cuDF, and Modin.

Seamlessly support all four, without depending on any of them!

  • Just use a subset of the Polars API, no need to learn anything new
  • No dependencies (not even Polars), keep your library lightweight
  • ✅ Separate Lazy and Eager APIs
  • ✅ Use Polars Expressions API

Note: this is work-in-progress, and a bit of an experiment, don't take it too seriously.

Installation

pip install narwhals

Or just vendor it, it's only a bunch of pure-Python files.

Usage

There are three steps to writing dataframe-agnostic code using Narwhals:

  1. use narwhals.to_polars_api to wrap a pandas, Polars, cuDF, or Modin dataframe in the Polars API

  2. use the subset of the Polars API defined in https://github.com/MarcoGorelli/narwhals/blob/main/narwhals/spec/__init__.py.

  3. use narwhals.to_original_object to return an object to the user in their original dataframe flavour. For example:

    • if you started with pandas, you'll get pandas back
    • if you started with Polars, you'll get Polars back
    • if you started with Modin, you'll get Modin back
    • if you started with cuDF, you'll get cuDF back (and computation will happen natively on the GPU!)

Example

Here's an example of a dataframe agnostic function:

from typing import TypeVar
import pandas as pd
import polars as pl

from narwhals import to_polars_api, to_original_object

AnyDataFrame = TypeVar("AnyDataFrame")


def my_agnostic_function(
    suppliers_native: AnyDataFrame,
    parts_native: AnyDataFrame,
) -> AnyDataFrame:
    suppliers, pl = to_polars_api(suppliers_native, version="0.20")
    parts, _ = to_polars_api(parts_native, version="0.20")
    result = (
        suppliers.join(parts, left_on="city", right_on="city")
        .filter(
            pl.col("color").is_in(["Red", "Green"]),
            pl.col("weight") > 14,
        )
        .group_by("s", "p")
        .agg(
            weight_mean=pl.col("weight").mean(),
            weight_max=pl.col("weight").max(),
        )
    )
    return to_original_object(result.collect())

You can pass in a pandas, Polars, cuDF, or Modin dataframe, the output will be the same! Let's try it out:

suppliers = {
    "s": ["S1", "S2", "S3", "S4", "S5"],
    "sname": ["Smith", "Jones", "Blake", "Clark", "Adams"],
    "status": [20, 10, 30, 20, 30],
    "city": ["London", "Paris", "Paris", "London", "Athens"],
}
parts = {
    "p": ["P1", "P2", "P3", "P4", "P5", "P6"],
    "pname": ["Nut", "Bolt", "Screw", "Screw", "Cam", "Cog"],
    "color": ["Red", "Green", "Blue", "Red", "Blue", "Red"],
    "weight": [12.0, 17.0, 17.0, 14.0, 12.0, 19.0],
    "city": ["London", "Paris", "Oslo", "London", "Paris", "London"],
}

print("pandas output:")
print(
    my_agnostic_function(
        pd.DataFrame(suppliers),
        pd.DataFrame(parts),
    )
)
print("\nPolars output:")
print(
    my_agnostic_function(
        pl.LazyFrame(suppliers),
        pl.LazyFrame(parts),
    )
)
pandas output:
    s   p  weight_mean  weight_max
0  S1  P6         19.0        19.0
1  S2  P2         17.0        17.0
2  S3  P2         17.0        17.0
3  S4  P6         19.0        19.0

Polars output:
shape: (4, 4)
┌─────┬─────┬─────────────┬────────────┐
│ s   ┆ p   ┆ weight_mean ┆ weight_max │
│ --- ┆ --- ┆ ---         ┆ ---        │
│ str ┆ str ┆ f64         ┆ f64        │
╞═════╪═════╪═════════════╪════════════╡
│ S1  ┆ P6  ┆ 19.0        ┆ 19.0       │
│ S3  ┆ P2  ┆ 17.0        ┆ 17.0       │
│ S4  ┆ P6  ┆ 19.0        ┆ 19.0       │
│ S2  ┆ P2  ┆ 17.0        ┆ 17.0       │
└─────┴─────┴─────────────┴────────────┘

Magic! 🪄

Scope

  • Do you maintain a dataframe-consuming library?
  • Is there a Polars function which you'd like Narwhals to have, which would make your job easier?

If, I'd love to hear from you!

Note: this is not a "Dataframe Standard" project. It just translates a subset of the Polars API to pandas-like libraries.

Why "Narwhals"?

Because they are so awesome.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

narwhals-0.1.9.tar.gz (18.7 kB view details)

Uploaded Source

Built Distribution

narwhals-0.1.9-py3-none-any.whl (18.5 kB view details)

Uploaded Python 3

File details

Details for the file narwhals-0.1.9.tar.gz.

File metadata

  • Download URL: narwhals-0.1.9.tar.gz
  • Upload date:
  • Size: 18.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for narwhals-0.1.9.tar.gz
Algorithm Hash digest
SHA256 2c6909275c57bc5ff847887cef924ace8d4d16e1d5ea9934d64cc4a9c0972580
MD5 7d1ab4ade228257b84c6b1089260ee78
BLAKE2b-256 d72d6ce353dc429df0e9ac15f3ccf134cad40ff08ff1d4506677d41068ca8e15

See more details on using hashes here.

File details

Details for the file narwhals-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: narwhals-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 18.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for narwhals-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 20be8b22a5e85fb1fba9689dc1a6df37c04d3a2c2924399652f61fbbec60ce15
MD5 89242834ac9a3a300db2ef4b5d20ec27
BLAKE2b-256 c20b586289a3ff615eafe8eb4057475587a5e16cb3c24c47929d1d495928aa0e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page