Question

I have a data frame with item SKUs and associated quantities. Some of these SKUs are Parent SKUs that represent several component SKUs that I care about.

我需要一种解决办法,即建立一个包含部件及其相关数量的数据框架,作为与原母公司使用日期不同的项目。我目前的解决办法使母公司的所有使用都落为一个线,失去日期信息。感谢你们的帮助!

df1 <- data.frame(SKU = c("abc", "def", "ghi", "abc", "mno"), Qty = c(2,1,1,1,2), Date = c("1-1", "1-1", "1-2", "1-2", "1-2"))

df2 <- data.frame(Parent_SKU = c("def", "def", "mno"), Component = c("abc","jkl","abc"), Component_Qty = c(1,3,1))

数据框架1

|   SKU    |   Qty    |   Date   |
| -------- | -------- | -------- |
|   abc    |     2    |    1-1   |
|   def    |     1    |    1-1   |
|   ghi    |     1    |    1-2   |
|   abc    |     1    |    1-2   |
|   mno    |     2    |    1-2   |

数据框架2

| Parent_SKU | Component | Component_Qty |
|  --------  | --------- | ------------- |
|    def     |    abc    |       1       |
|    def     |    jkl    |       3       |
|    mno     |    abc    |       1       |

Data Frame 3 (what I want)

|   SKU    |   Qty    |   Date   |
| -------- | -------- | -------- |
|   abc    |     2    |    1-1   |
|   def    |     1    |    1-1   |
|   ghi    |     1    |    1-2   |
|   abc    |     1    |    1-2   |
|   mno    |     2    |    1-2   |
|   abc    |     1    |    1-1   |
|   jkl    |     3    |    1-1   |
|   abc    |     2    |    1-2   |

我最初的尝试把所有母公司分成一行,通过寻找部件使用而成倍,并将其附在主要数据框架。这种解决办法是不明智的,失去日期信息,并且将所有母公司倒入一条线(我也关心某个项目的增长)。

library(tidyverse)

#create data frame with Parent_SKU usage
df4 <- df1 %>% filter(SKU %in% unique(df2$Parent_SKU)) %>% group_by(SKU) %>% summarize(Qty = sum(Qty))

#Rename column for joining
df4 <- df4 %>% rename("Parent_SKU"="SKU")

#Create new df with Parent_SKU Qty associated with components
df5 <- full_join(df4,df2)

#Turn Qty column into qty of component use
df5$Qty <- df5$Qty*df5$Component_Qty

#Rename component column for joining
df5 <- df5 %>% rename("SKU"="Component")

#Append component usage together with original data frame & get rid of non useful columns
df3 <- dplyr::bind_rows(df5, df1)
df3 <- df3[-c(1,4)]

结果:

|   SKU    |   Qty    |   Date   |
| -------- | -------- | -------- |
|   abc    |     2    |    1-1   |
|   def    |     1    |    1-1   |
|   ghi    |     1    |    1-2   |
|   abc    |     1    |    1-2   |
|   mno    |     2    |    1-2   |
|   abc    |     1    |    NA    |
|   jkl    |     3    |    NA    |
|   abc    |     2    |    NA    |

Answer 1

采用“@Mark”办法,但包括父母在计算部件使用时的使用情况。

library(dplyr)
        
# join the dfs, get the info for the components, multiply through to get component usage, and remove component_qty column
df3 <- inner_join(df1, df2, by = c("SKU" = "Parent_SKU")) |>
    select(SKU = Component, Component_Qty = Component_Qty, Date = Date, Qty = Qty) 
    %>% mutate(Qty = Qty*Component_Qty) %>% select(-Component_Qty)


# append the component data to the end
bind_rows(df1, df3)

产出:

  SKU Qty Date
    1 abc   2  1-1
    2 def   1  1-1
    3 ghi   1  1-2
    4 abc   1  1-2
    5 mno   2  1-2
    6 abc   1  1-1
    7 jkl   3  1-1
    8 abc   2  1-2

友情链接