Question

I am new to Python. I have searched extensively for a solution to this problem without any success.

我有2个文件,即文件A和文件B,每个样本如下。

For each PO/ItemCode combination in File A, I want to find a matching combination in 文件B and output a record to new csv file, File C. If there are no matching records in 文件B, output a single record to File C with none/00 as the Ref/Line fields. PO 1002 and 1005 are examples.

如果与文件B记录对应的可用数量少于文件A所要求的数量,则只为现有数量撰写文件C记录,再读文件B,以找到下一个对应记录。如果没有更多的配对记录,就为剩余的(未配对的)文件A数量撰写记录,以存档C。 PO 1006就是一个例子。

The total quantity for each PO/ItemCode combination in File C will be the same as the combination total in File A.

文件A中可能有相同的PO/ItemCode组合。这些记录作为一份文件A记录处理,其总累积数为之。 PO 1003就是一个例子。

File A

PO        ItemCode   Invoice   QtyA
1001      ITEMA      2001      2
1001      ITEMB      2001      1
1002      ITEMB      2002      4
1003      ITEMA      2003      4
1003      ITEMA      2003      5
1004      ITEMA      2004      1
1005      ITEMB      2005      3
1006      ITEMA      2006      5

文件B

PO        ItemCode   QtyB   Ref       Line
1000      ITEMA      2      8232      12
1001      ITEMA      2      8986      15
1001      ITEMB      2      8986      16
1003      ITEMA      7      8987      08
1004      ITEMA      3      8415      19
1006      ITEMA      2      8469      01
1006      ITEMA      1      8253      12
1008      ITEMB      3      8745      03

文件C(产出)

PO        ItemCode   Invoice   QtyC     Ref       Line
1001      ITEMA      2001      2        8986      15
1001      ITEMB      2001      1        8986      16
1002      ITEMB      2002      4        none      00
1003      ITEMA      2003      7        8987      08
1003      ITEMA      2003      2        none      00
1004      ITEMA      2004      1        8415      19
1005      ITEMB      2005      3        none      00
1006      ITEMA      2006      2        8469      01
1006      ITEMA      2006      1        8253      12
1006      ITEMA      2006      2        none      00

I ve tried achieving this by reading File A records in a for loop and using PO/ItemCode as index to 文件B (read in as a DataFrame), but have been unable to match 文件B records beyond the first matched record. I ve also been unable to find a method to recognize a File A record having the same PO/ItemCode as the previous record.

Answer 1

我希望我了解你的问题正确:你可以把数据范围A和数量A组起来,然后在数据框B中将数量B从总量A中减去:

df_A = df_A.groupby(["PO", "ItemCode"], as_index=False).agg(
    {"Invoice": "first", "QtyA": "sum"}
)
df_B = df_B.set_index(["PO", "ItemCode"])

out = []
for rowA in df_A.itertuples():
    if (rowA.PO, rowA.ItemCode) not in df_B.index:
        out.append(
            {
                "PO": rowA.PO,
                "ItemCode": rowA.ItemCode,
                "Invoice": rowA.Invoice,
                "QtyC": rowA.QtyA,
                "Ref": None,
                "Line": 0,
            }
        )
        continue

    qty_remaining = rowA.QtyA
    for rowB in df_B.loc[(rowA.PO, rowA.ItemCode)].itertuples():
        if qty_remaining - rowB.QtyB >= 0:
            n = rowB.QtyB
        else:
            n = qty_remaining

        out.append(
            {
                "PO": rowA.PO,
                "ItemCode": rowA.ItemCode,
                "Invoice": rowA.Invoice,
                "QtyC": n,
                "Ref": rowB.Ref,
                "Line": rowB.Line,
            }
        )

        qty_remaining -= n
        if qty_remaining == 0:
            break

    if qty_remaining > 0:
        out.append(
            {
                "PO": rowA.PO,
                "ItemCode": rowA.ItemCode,
                "Invoice": rowA.Invoice,
                "QtyC": qty_remaining,
                "Ref": None,
                "Line": 0,
            }
        )

out = pd.DataFrame(out)
print(out)

印刷:

     PO ItemCode  Invoice  QtyC     Ref  Line
0  1001    ITEMA     2001     2  8986.0    15
1  1001    ITEMB     2001     1  8986.0    16
2  1002    ITEMB     2002     4     NaN     0
3  1003    ITEMA     2003     7  8987.0     8
4  1003    ITEMA     2003     2     NaN     0
5  1004    ITEMA     2004     1  8415.0    19
6  1005    ITEMB     2005     3     NaN     0
7  1006    ITEMA     2006     2  8469.0     1
8  1006    ITEMA     2006     1  8253.0    12
9  1006    ITEMA     2006     2     NaN     0

Answer 2

如同Pandas交织在一起,情况很多。我不理解的一件事是,为什么预期产出数据框架有3个条目,用于1006PO。

import pandas as pd
fnA, fnB = (the, filenames)
aR, bR = [pd.read_csv(x), delim_whitespace=True) for x in (fnA, fnB)]
cR = aR.merge(bR, how="left") 
print(cR.to_markdown())

PO	ItemCode	Invoice	QtyA	QtyB	Ref	Line
1001	ITEMA	2001	2	2	8986	15
1001	ITEMB	2001	1	2	8986	16
1002	ITEMB	2002	4	nan	nan	nan
1003	ITEMA	2003	4	7	8987	8
1003	ITEMA	2003	5	7	8987	8
1004	ITEMA	2004	1	3	8415	19
1005	ITEMB	2005	3	nan	nan	nan
1006	ITEMA	2006	5	2	8469	1
1006	ITEMA	2006	5	1	8253	12

友情链接