English 中文(简体)
现有一栏中的多种配对值的转变和替换
原标题:transform and replace values in existing column IF multiple matching values exist using pandas

我有两个数据集:A和B。

如果数据集Ayear ,,delivery,type , 和vendor 所有栏目均与数据集BTags qtrTYPEmsc相吻合。 然后,我想将A组数据中的配量与<代码>项目名称/代码”相匹配的条目中的<代码>项目_id从B组的相应行取。 否则,我不想改动A中的<代码>项目_id。

数据集A:

Year    ID  deliv   Gen type    vendor  project_id
2022    BR  Q2 2022 L   aa      d       BR2 aa1 Q2 2022 - L
2022    BR  Q2 2022 L   dd      d       BR2 dd1 Q2 2022 - L
2022    BR  Q2 2022 L   dd      d       BR2 dd2 Q2 2022 - L
2022    BR  Q3 2022 L   bb      d       BR2 bb1 Q3 2022 - L
2022    BR  Q4 2022 L   aa      d       BR2 aa1 Q4 2022 - L
2022    BR  Q4 2022 L   dd      nd      BR2 dd1 Q4 2022 - L

B类数据:

Project Name          Tags  ID      qtr     TYPE    msc NUM
BB H_AA01 Q4 2022     2022  BOLOL   Q4 2022 aa      d   01
BR2 H_DD_nd02 Q4 2022 2022  BR      Q4 2022 dd      nd  02
BR2 BB01 Q3.2022      2022  BR      Q3 2022 bb      d   01
BR2 H_DD01 Q2 2022    2022  BR      Q2 2022 dd      d   01
BR2 H_DD02 Q2 2022    2022  BR      Q2 2022 dd      d   02
BR2 H_AA01 Q2 2022    2022  BR      Q2 2022 aa      d   01
    

desired result:

Year    ID  delivery    Gen type    vendor  project_id
2022    BR  Q2 2022     L   aa      d       BR2 H_AA01 Q2 2022
2022    BR  Q2 2022     L   dd      d       BR2 H_DD01 Q2 2022
2022    BR  Q2 2022     L   dd      d       BR2 H_DD02 Q2 2022
2022    BR  Q3 2022     L   bb      d       BR2 BB01 Q3.2022
2022    BR  Q4 2022     L   aa      d       BR2 aa1 Q4 2022 - L
2022    BR  Q4 2022     L   dd      nd      BR2 H_DD_nd02 Q4 2022

我目前试图:

df_merged = pd.merge(df_A, df_B[[ Project Name ,  Tags ,  ID ,  qtr ,  TYPE ,  msc ]], 
                     how= left , 
                     left_on=[ Year ,  ID ,  delivery ,  type ,  vendor ], 
                     right_on=[ Tags ,  ID ,  qtr ,  TYPE ,  msc ])

# Replacing  project_id  in A with  Project Name  from B where there is a match
df_merged[ project_id ] = df_merged[ Project Name ].combine_first(df_merged[ project_id ])

# Dropping unnecessary columns from the merge
df_final = df_merged.drop([ Project Name ,  Tags ,  qtr ,  TYPE ,  msc ], axis=1)

然而,上述文字冲破了我的数据集,造成了不必要的浏览和多栏。

最终产出应与原始数据集相同。 唯一的区别是<代码>项目_id。 正在更新一栏。 我如何适当开展这一行动?

最佳回答

IIUC, you can try :

lcols = ["Year", "ID", "delivery", "type", "vendor"]
rcols = ["Tags", "ID", "qtr", "TYPE", "msc"]

out = (
    pd.merge(
        df_A, df_B,
        left_on=[*lcols, df_A.groupby(lcols).cumcount()],
        right_on=[*rcols, df_B.groupby(rcols).cumcount()],
        how="left"
    )
    .assign(project_id= lambda x: x["Project Name"].fillna(x["project_id"]))
    [df_A.columns]
)

产出:

print(out)

   Year  ID delivery Gen type vendor             project_id
0  2022  BR  Q2 2022   L   aa      d     BR2 H_AA01 Q2 2022
1  2022  BR  Q2 2022   L   dd      d     BR2 H_DD01 Q2 2022
2  2022  BR  Q2 2022   L   dd      d     BR2 H_DD02 Q2 2022
3  2022  BR  Q3 2022   L   bb      d       BR2 BB01 Q3.2022
4  2022  BR  Q4 2022   L   aa      d    BR2 aa1 Q4 2022 - L
5  2022  BR  Q4 2022   L   dd     nd  BR2 H_DD_nd02 Q4 2022

[6 rows x 7 columns]
问题回答

您的问题产生于以下两个条目:df_A:

2022    BR  Q2 2022 L   dd      d       BR2 dd1 Q2 2022 - L
2022    BR  Q2 2022 L   dd      d       BR2 dd2 Q2 2022 - L

df_B:

2022    BR  Q2 2022     L   dd      d       BR2 H_DD01 Q2 2022
2022    BR  Q2 2022     L   dd      d       BR2 H_DD02 Q2 2022

在您的合并中,这完全是相同的合并条件,因此,你们的产出中有4个增长,而不是你想要的2个:

1  2022  BR  Q2 2022   L   dd      d     BR2 H_DD02 Q2 2022
2  2022  BR  Q2 2022   L   dd      d     BR2 H_DD01 Q2 2022
3  2022  BR  Q2 2022   L   dd      d     BR2 H_DD02 Q2 2022
4  2022  BR  Q2 2022   L   dd      d     BR2 H_DD01 Q2 2022

如果没有进一步确定<条码>项目名称在<代码>df_B中对应的<条码>项目_id<条码>,你将始终存在这一问题。 您可使用<代码>cumcount 解决办法,以“无时能力回答”作为这项工作的基础,但结果将取决于<代码>df_A和df_B的浏览顺序。 例如,如果浏览4 &df_B中的5个,那么你将获得这一产出:

   Year  ID delivery Gen type vendor             project_id
0  2022  BR  Q2 2022   L   aa      d     BR2 H_AA01 Q2 2022
1  2022  BR  Q2 2022   L   dd      d     BR2 H_DD02 Q2 2022
2  2022  BR  Q2 2022   L   dd      d     BR2 H_DD01 Q2 2022
3  2022  BR  Q3 2022   L   bb      d       BR2 BB01 Q3.2022
4  2022  BR  Q4 2022   L   aa      d    BR2 aa1 Q4 2022 - L
5  2022  BR  Q4 2022   L   dd     nd  BR2 H_DD_nd02 Q4 2022

这可能是为了你的目的,但这是应该知道的。 理想的情况是,你应有一个独特之处,能够确定每个项目,并且存在于<条码>df_A和<条码>df_B。 否则,你今后将继续处理这个问题。





相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签