I am doing a bit of data munging on a polars.Dataframe
and I could write the same expression twice, but I would ideally like to cut down on that a bit. So I was thinking that I could just create a user-defined function that just plugs in the column names.
But, I know that polars tends to be a bit reluctant to let people bring in user-defined functions (and for good reasons), but it feels a bit tedious for me to write out the same expression over and over again, but with different columns.
So let s say that I have a polars dataframe like this:
import polars as pl
df = pl.DataFrame({
a :[ Strongly Disagree , Disagree , Agree , Strongly Agree ],
b :[ Strongly Agree , Agree , Disagree , Strongly Disagree ],
c :[ Agree , Strongly Agree , Strongly Disagree , Disagree ]
})
And, I could just use the when-then-otherwise
expression to convert these three to numeric columns:
df_clean = df.with_columns(
pl.when(
pl.col( a ) == Strongly Disagree
).then(
pl.lit(1)
).when(
pl.col( a ) == Disagree
).then(
pl.lit(2)
).when(
pl.col( a ) == Agree
).then(
pl.lit(3)
).when(
pl.col( a ) == Strongly Agree
).then(
pl.lit(4)
)
)
But I don t want to write this out two more times.
So I was thinking, I could just write a function so then I could just map over a
, b
, and c
, but this seems like it wouldn t work.
Anyone have any advice for the most efficient way to do this?