Extremely lightweight compatibility layer between pandas, Polars, cuDF, and Modin
Project description
Narwhals
Extremely lightweight compatibility layer between Polars, pandas, and more.
Seamlessly support both, without depending on either!
- ✅ Just use a subset of the Polars API, no need to learn anything new
- ✅ No dependencies (not even Polars), keep your library lightweight
- ✅ Separate lazy and eager APIs
- ✅ Use Polars Expressions
Note: this is work-in-progress, and a bit of an experiment, don't take it too seriously.
Installation
pip install narwhals
Or just vendor it, it's only a bunch of pure-Python files.
Usage
There are three steps to writing dataframe-agnostic code using Narwhals:
-
use
narwhals.LazyFrame
ornarwhals.DataFrame
to wrap a pandas or Polars DataFrame/LazyFrame in a Narwhals class -
use the subset of the Polars API supported by Narwhals. Just like in Polars, some methods (e.g.
to_numpy
) are only available forDataFrame
, notLazyFrame
-
use
narwhals.to_native
to return an object to the user in its original dataframe flavour. For example:- if you started with pandas, you'll get pandas back
- if you started with Polars, you'll get Polars back
- if you started with Polars, you'll get Polars back
Example
Here's an example of a dataframe agnostic function:
from typing import Any
import pandas as pd
import polars as pl
import narwhals as nw
def my_agnostic_function(
suppliers_native,
parts_native,
):
suppliers = nw.LazyFrame(suppliers_native)
parts = nw.LazyFrame(parts_native)
result = (
suppliers.join(parts, left_on="city", right_on="city")
.filter(nw.col("weight") > 10)
.group_by("s")
.agg(
weight_mean=nw.col("weight").mean(),
weight_max=nw.col("weight").max(),
)
)
return nw.to_native(result)
You can pass in a pandas or Polars dataframe, the output will be the same! Let's try it out:
suppliers = {
"s": ["S1", "S2", "S3", "S4", "S5"],
"sname": ["Smith", "Jones", "Blake", "Clark", "Adams"],
"status": [20, 10, 30, 20, 30],
"city": ["London", "Paris", "Paris", "London", "Athens"],
}
parts = {
"p": ["P1", "P2", "P3", "P4", "P5", "P6"],
"pname": ["Nut", "Bolt", "Screw", "Screw", "Cam", "Cog"],
"color": ["Red", "Green", "Blue", "Red", "Blue", "Red"],
"weight": [12.0, 17.0, 17.0, 14.0, 12.0, 19.0],
"city": ["London", "Paris", "Oslo", "London", "Paris", "London"],
}
print("pandas output:")
print(
my_agnostic_function(
pd.DataFrame(suppliers),
pd.DataFrame(parts),
)
)
print("\nPolars output:")
print(
my_agnostic_function(
pl.LazyFrame(suppliers),
pl.LazyFrame(parts),
).collect()
)
pandas output:
s weight_mean weight_max
0 S1 15.0 19.0
1 S2 14.5 17.0
2 S3 14.5 17.0
3 S4 15.0 19.0
Polars output:
shape: (4, 3)
┌─────┬─────────────┬────────────┐
│ s ┆ weight_mean ┆ weight_max │
│ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ f64 │
╞═════╪═════════════╪════════════╡
│ S2 ┆ 14.5 ┆ 17.0 │
│ S3 ┆ 14.5 ┆ 17.0 │
│ S4 ┆ 15.0 ┆ 19.0 │
│ S1 ┆ 15.0 ┆ 19.0 │
└─────┴─────────────┴────────────┘
Magic! 🪄
Scope
- Do you maintain a dataframe-consuming library?
- Is there a Polars function which you'd like Narwhals to have, which would make your job easier?
If, I'd love to hear from you!
Note: You might suspect that this is a secret ploy to infiltrate the Polars API everywhere. Indeed, you may suspect that.
Why "Narwhals"?
Because they are so awesome.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.