English 中文(简体)
按每组参考值对正正化组的极值解决方案
原标题:Polars solution to normalise groups by per-group reference value
I m trying to use Polars to normalise the values of groups of entries by a single reference value per group. Sample data: df = pl.from_repr(""" ┌──────────┬─────────────────┬───────┐ │ group_id ┆ reference_state ┆ value │ │ --- ┆ --- ┆ --- │ │ i64 ┆ str ┆ i64 │ ╞══════════╪═════════════════╪═══════╡ │ 1 ┆ ref ┆ 5 │ │ 1 ┆ a ┆ 3 │ │ 1 ┆ b ┆ 1 │ │ 2 ┆ ref ┆ 4 │ │ 2 ┆ a ┆ 8 │ │ 2 ┆ b ┆ 2 │ └──────────┴─────────────────┴───────┘ """) I m trying to generate the column normalised which contains value divided by the per-group ref reference state value. This is straightforward in Pandas: df = df.to_pandas() for (i, x) in df.groupby("group_id"): ref_val = x.loc[x["reference_state"] == "ref"]["value"] df.loc[df["group_id"] == i, "normalised"] = x["value"] / ref_val.to_list()[0] pl.from_pandas(df) shape: (6, 4) ┌──────────┬─────────────────┬───────┬────────────┐ │ group_id ┆ reference_state ┆ value ┆ normalised │ │ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ str ┆ i64 ┆ f64 │ ╞══════════╪═════════════════╪═══════╪════════════╡ │ 1 ┆ ref ┆ 5 ┆ 1.0 │ │ 1 ┆ a ┆ 3 ┆ 0.6 │ │ 1 ┆ b ┆ 1 ┆ 0.2 │ │ 2 ┆ ref ┆ 4 ┆ 1.0 │ │ 2 ┆ a ┆ 8 ┆ 2.0 │ │ 2 ┆ b ┆ 2 ┆ 0.5 │ └──────────┴─────────────────┴───────┴────────────┘ Is there a way to do this in Polars? Thanks in advance!
最佳回答
You can use a window function to make an expression operate on different groups via: .over("group_id") and then you can write the logic which divides by the values if equal to "ref" with: pl.col("value") / pl.col("value").filter(pl.col("reference_state") == "ref").first() Putting it all together: df = pl.DataFrame({ "group_id": [1, 1, 1, 2, 2, 2], "reference_state": ["ref", "a", "b", "ref", "a", "b"], "value": [5, 3, 1, 4, 8, 2], }) (df.with_columns( ( pl.col("value") / pl.col("value").filter(pl.col("reference_state") == "ref").first() ).over("group_id").alias("normalised") )) shape: (6, 4) ┌──────────┬─────────────────┬───────┬────────────┐ │ group_id ┆ reference_state ┆ value ┆ normalised │ │ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ str ┆ i64 ┆ f64 │ ╞══════════╪═════════════════╪═══════╪════════════╡ │ 1 ┆ ref ┆ 5 ┆ 1.0 │ │ 1 ┆ a ┆ 3 ┆ 0.6 │ │ 1 ┆ b ┆ 1 ┆ 0.2 │ │ 2 ┆ ref ┆ 4 ┆ 1.0 │ │ 2 ┆ a ┆ 8 ┆ 2.0 │ │ 2 ┆ b ┆ 2 ┆ 0.5 │ └──────────┴─────────────────┴───────┴────────────┘
问题回答
Here s one way to do it: create a temporary dataframe which, for each group_id, tells you the value where reference_state is ref join with that temporary dataframe ( df.join( df.filter(pl.col("reference_state") == "ref").select("group_id", "value"), on="group_id", ) .with_columns((pl.col("value") / pl.col("value_right")).alias("normalised")) .drop("value_right") ) This gives you: Out[16]: shape: (6, 4) ┌──────────┬─────────────────┬───────┬────────────┐ │ group_id ┆ reference_state ┆ value ┆ normalised │ │ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ str ┆ i64 ┆ f64 │ ╞══════════╪═════════════════╪═══════╪════════════╡ │ 1 ┆ ref ┆ 5 ┆ 1.0 │ │ 1 ┆ a ┆ 3 ┆ 0.6 │ │ 1 ┆ b ┆ 1 ┆ 0.2 │ │ 2 ┆ ref ┆ 4 ┆ 1.0 │ │ 2 ┆ a ┆ 8 ┆ 2.0 │ │ 2 ┆ b ┆ 2 ┆ 0.5 │ └──────────┴─────────────────┴───────┴────────────┘




相关问题
Map user-defined function on multiple polars columns

I am doing a bit of data munging on a polars.Dataframe and I could write the same expression twice, but I would ideally like to cut down on that a bit. So I was thinking that I could just create a ...

Python Polars - conditional join on value between other columns

I have a Polars DataFrame that looks like this: ┌────────────┬───────┐ │ date ┆ value │ │ --- ┆ --- │ │ str ┆ i64 │ ╞════════════╪═══════╡ │ 2022-01-01 ┆ 3 │ ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌...

How to use polars dataframes with scikit-learn?

I m unable to use polars dataframes with scikitlearn for ML training. Currently I m doing all the dataframe preprocessing in polars and during model training i m converting it into a pandas one in ...

Polars: Nesting `over` calls

Context. I have written a function that computes the mean of all elements in a column except the elements in the current group. df = pl.DataFrame({ "group": ["A", "A",...

Storing in PostgreSQL using Python Polars

I want to store a datframe from a parquet file into a PostgreSQL using Polars using this code: def store_in_postgresql(df): password = anon username = postgres database = nyc_taxis ...

热门标签