The goal of the code below is to scale all columns (except customer_id
) inside df_filtered
to 0-1 range and save the output to df_scaled
, while preserving all customer_id
.
# Separate customer_id column and feature columns
customer_ids = df_filtered[ customer_id ]
features = df_filtered.drop( customer_id , axis=1)
# Transform the feature columns
scaler = MinMaxScaler()
scaled_features = scaler.fit_transform(features)
# Create a new DataFrame with the scaled features and customer_id column
df_scaled = pd.DataFrame(scaled_features, columns=features.columns)
df_scaled[ customer_id ] = customer_ids
# Reorder the columns (optional, to match the original DataFrame)
df_scaled = df_scaled[[ customer_id , trx_cnt , gtv , service_cnt , active_day_cnt , recency ]]
However, I noticed ~10% of customer_id
inside df_scaled
becomes NaN, hence I m losing the identifier. The row numbers persist but the value of customer_id
becomes NaN.
Why is this happening? Can you point out the way to fix this?
Thanks!