English 中文(简体)
利用Adhury在2个档案中记录与许多对人的关系相匹配
原标题:Using Python to match records across 2 files with many-to-many relationship

I am new to Python. I have searched extensively for a solution to this problem without any success.

我有2个文件,即文件A和文件B,每个样本如下。

For each PO/ItemCode combination in File A, I want to find a matching combination in 文件B and output a record to new csv file, File C. If there are no matching records in 文件B, output a single record to File C with none/00 as the Ref/Line fields. PO 1002 and 1005 are examples.

如果与文件B记录对应的可用数量少于文件A所要求的数量,则只为现有数量撰写文件C记录,再读文件B,以找到下一个对应记录。 如果没有更多的配对记录,就为剩余的(未配对的)文件A数量撰写记录,以存档C。 PO 1006就是一个例子。

The total quantity for each PO/ItemCode combination in File C will be the same as the combination total in File A.

文件A中可能有相同的PO/ItemCode组合。 这些记录作为一份文件A记录处理,其总累积数为之。 PO 1003就是一个例子。

File A

PO        ItemCode   Invoice   QtyA
1001      ITEMA      2001      2
1001      ITEMB      2001      1
1002      ITEMB      2002      4
1003      ITEMA      2003      4
1003      ITEMA      2003      5
1004      ITEMA      2004      1
1005      ITEMB      2005      3
1006      ITEMA      2006      5

文件B

PO        ItemCode   QtyB   Ref       Line
1000      ITEMA      2      8232      12
1001      ITEMA      2      8986      15
1001      ITEMB      2      8986      16
1003      ITEMA      7      8987      08
1004      ITEMA      3      8415      19
1006      ITEMA      2      8469      01
1006      ITEMA      1      8253      12
1008      ITEMB      3      8745      03

文件C(产出)

PO        ItemCode   Invoice   QtyC     Ref       Line
1001      ITEMA      2001      2        8986      15
1001      ITEMB      2001      1        8986      16
1002      ITEMB      2002      4        none      00
1003      ITEMA      2003      7        8987      08
1003      ITEMA      2003      2        none      00
1004      ITEMA      2004      1        8415      19
1005      ITEMB      2005      3        none      00
1006      ITEMA      2006      2        8469      01
1006      ITEMA      2006      1        8253      12
1006      ITEMA      2006      2        none      00

I ve tried achieving this by reading File A records in a for loop and using PO/ItemCode as index to 文件B (read in as a DataFrame), but have been unable to match 文件B records beyond the first matched record. I ve also been unable to find a method to recognize a File A record having the same PO/ItemCode as the previous record.

问题回答

我希望我了解你的问题正确:你可以把数据范围A和数量A组起来,然后在数据框B中将数量B从总量A中减去:

df_A = df_A.groupby(["PO", "ItemCode"], as_index=False).agg(
    {"Invoice": "first", "QtyA": "sum"}
)
df_B = df_B.set_index(["PO", "ItemCode"])

out = []
for rowA in df_A.itertuples():
    if (rowA.PO, rowA.ItemCode) not in df_B.index:
        out.append(
            {
                "PO": rowA.PO,
                "ItemCode": rowA.ItemCode,
                "Invoice": rowA.Invoice,
                "QtyC": rowA.QtyA,
                "Ref": None,
                "Line": 0,
            }
        )
        continue

    qty_remaining = rowA.QtyA
    for rowB in df_B.loc[(rowA.PO, rowA.ItemCode)].itertuples():
        if qty_remaining - rowB.QtyB >= 0:
            n = rowB.QtyB
        else:
            n = qty_remaining

        out.append(
            {
                "PO": rowA.PO,
                "ItemCode": rowA.ItemCode,
                "Invoice": rowA.Invoice,
                "QtyC": n,
                "Ref": rowB.Ref,
                "Line": rowB.Line,
            }
        )

        qty_remaining -= n
        if qty_remaining == 0:
            break

    if qty_remaining > 0:
        out.append(
            {
                "PO": rowA.PO,
                "ItemCode": rowA.ItemCode,
                "Invoice": rowA.Invoice,
                "QtyC": qty_remaining,
                "Ref": None,
                "Line": 0,
            }
        )

out = pd.DataFrame(out)
print(out)

印刷:

     PO ItemCode  Invoice  QtyC     Ref  Line
0  1001    ITEMA     2001     2  8986.0    15
1  1001    ITEMB     2001     1  8986.0    16
2  1002    ITEMB     2002     4     NaN     0
3  1003    ITEMA     2003     7  8987.0     8
4  1003    ITEMA     2003     2     NaN     0
5  1004    ITEMA     2004     1  8415.0    19
6  1005    ITEMB     2005     3     NaN     0
7  1006    ITEMA     2006     2  8469.0     1
8  1006    ITEMA     2006     1  8253.0    12
9  1006    ITEMA     2006     2     NaN     0

如同Pandas交织在一起,情况很多。 我不理解的一件事是,为什么预期产出数据框架有3个条目,用于1006PO。

import pandas as pd
fnA, fnB = (the, filenames)
aR, bR = [pd.read_csv(x), delim_whitespace=True) for x in (fnA, fnB)]
cR = aR.merge(bR, how="left") 
print(cR.to_markdown())
PO ItemCode Invoice QtyA QtyB Ref Line
1001 ITEMA 2001 2 2 8986 15
1001 ITEMB 2001 1 2 8986 16
1002 ITEMB 2002 4 nan nan nan
1003 ITEMA 2003 4 7 8987 8
1003 ITEMA 2003 5 7 8987 8
1004 ITEMA 2004 1 3 8415 19
1005 ITEMB 2005 3 nan nan nan
1006 ITEMA 2006 5 2 8469 1
1006 ITEMA 2006 5 1 8253 12




相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签