English 中文(简体)
如何从现有数据框架各栏的产品中建立一个新的数据框架?
原标题:How to create a new dataframe from the products of columns of an existing dataframe?

我有n一栏的数据。 我想建立一个新的数据框架,其中包含所有现有各栏以及任何2个(ab、ac、...)、3个(abc、abd、......)、4个(bd、abce、...)、5个(bcde、abcdf、...)栏和N(abcdef)栏的所有可能产品。 唯一的标准是没有任何产品重复一栏。 以下是n=3:

<>光度>

col1: a
col2: b
col3: c

<>加强>

col1: a
col2: b
col3: c
col4: a*b
col5: a*c
col6: b*c
col7: a*b*c

建立这种数据框架的最有效方式是什么?

问题回答

我建议使用<代码>tertools.combination(),以获得一定长度的栏目的所有组合,并在栏目中填入。

Example:

import numpy as np
import pandas as pd
import itertools


df = pd.DataFrame({col: np.random.rand(10) for col in  abcdef })


def all_combined_product_cols(df):
    cols = list(df.columns)
    product_cols = []
    for length in range(1, len(cols) + 1):
        for combination in itertools.combinations(cols, r=length):
            combined_col = None
            for col in combination:
                if combined_col is None:
                    combined_col = df[col].copy()
                else:
                    combined_col *= df[col]
            combined_col.name =  _ .join(combination)
            product_cols.append(combined_col)
    return pd.concat(product_cols, axis=1)


print(all_combined_product_cols(df))




相关问题
re-arrange data by pairs recursively

I have dataframe contains ACQ/REL pair recusively as below: import pandas as pd data = [ [ 2023-06-05 16:51:27.561 , ACQ , location ], [ 2023-06-05 16:51:27.564 , ACQ , location ], [ ...

Filling NAN values in Pandas by using previous values

I have a Pandas DataFrame in the following format. I am trying to fill the NaN value by using the most recent non-NaN value and adding one second to the time value. For example, in this case, the ...

Python/Pandas convert string to time only

I have the following Pandas dataframe in Python 2.7. import pandas as pd trial_num = [1,2,3,4,5] sail_rem_time = [ 11:33:11 , 16:29:05 , 09:37:56 , 21:43:31 , 17:42:06 ] dfc = pd.DataFrame(zip(*[...

Pretty-print an entire Pandas Series / DataFrame

I work with Series and DataFrames on the terminal a lot. The default __repr__ for a Series returns a reduced sample, with some head and tail values, but the rest missing. Is there a builtin way to ...

How to invert the x or y axis

I have a scatter plot graph with a bunch of random x, y coordinates. Currently the Y-Axis starts at 0 and goes up to the max value. I would like the Y-Axis to start at the max value and go up to 0. ...

热门标签