我拥有一个数据框架,其数据范围与以下各点相类似,其人数为x人(1,000人以上)、每人交易次数×以及变量数(1,000多个变量):
Person_ID | transaction_ID | variable_1 | variable_2 | variable_3 | variable_X |
---|---|---|---|---|---|
person1 | transaction1 | 123 | 0 | 1 | abc |
person1 | transaction2 | 456 | 1 | 0 | def |
person1 | transaction3 | 123 | 0 | 1 | abc |
personx | transaction1 | 123 | 0 | 1 | abc |
personx | transaction2 | 456 | 0 | 1 | def |
I want to pad it with rows containing -10 at the beginning of every person id group so that the total number of rows per person id group is 6, like the following:
Person_ID | transaction_ID | variable_1 | variable_2 | variable_3 | variable_X |
---|---|---|---|---|---|
person1 | -10 | -10 | -10 | -10 | -10 |
person1 | -10 | -10 | -10 | -10 | -10 |
person1 | -10 | -10 | -10 | -10 | -10 |
person1 | transaction1 | 123 | 0 | 1 | abc |
person1 | transaction2 | 456 | 1 | 0 | def |
person1 | transaction3 | 123 | 0 | 1 | abc |
personx | -10 | -10 | -10 | -10 | -10 |
personx | -10 | -10 | -10 | -10 | -10 |
personx | -10 | -10 | -10 | -10 | -10 |
personx | -10 | -10 | -10 | -10 | -10 |
personx | transaction1 | 123 | 0 | 1 | abc |
personx | transaction2 | 456 | 0 | 1 | def |
这里是我所尝试的法典(与目录一起更新)和下文中的错误。
df2 = pd.DataFrame([[ ] * len(newdf.columns)], columns=newdf.columns)
df2
for row in newdf.groupby( person_id )[ transaction_id ]:
x=newdf.groupby( person_id )[ person_id ].nunique()
if x.any() < 6:
newdf=pd.concat([newdf, df2*(6-x)], ignore_index=True)
RuntimeWarning: < not supported between instances of int and tuple , sort order is undefined for incomparable objects.
newdf=pd.concat([newdf, df2*(6-x)], ignore_index=True)
It appended several NaN rows to the bottom of the dataframe, but not inbetween groups as needed. Thank you in advance as I am a beginner.