I have a data frame with item SKUs and associated quantities. Some of these SKUs are Parent SKUs that represent several component SKUs that I care about.
我需要一种解决办法,即建立一个包含部件及其相关数量的数据框架,作为与原母公司使用日期不同的项目。 我目前的解决办法使母公司的所有使用都落为一个线,失去日期信息。 感谢你们的帮助!
df1 <- data.frame(SKU = c("abc", "def", "ghi", "abc", "mno"), Qty = c(2,1,1,1,2), Date = c("1-1", "1-1", "1-2", "1-2", "1-2"))
df2 <- data.frame(Parent_SKU = c("def", "def", "mno"), Component = c("abc","jkl","abc"), Component_Qty = c(1,3,1))
数据框架1
| SKU | Qty | Date |
| -------- | -------- | -------- |
| abc | 2 | 1-1 |
| def | 1 | 1-1 |
| ghi | 1 | 1-2 |
| abc | 1 | 1-2 |
| mno | 2 | 1-2 |
数据框架2
| Parent_SKU | Component | Component_Qty |
| -------- | --------- | ------------- |
| def | abc | 1 |
| def | jkl | 3 |
| mno | abc | 1 |
Data Frame 3 (what I want)
| SKU | Qty | Date |
| -------- | -------- | -------- |
| abc | 2 | 1-1 |
| def | 1 | 1-1 |
| ghi | 1 | 1-2 |
| abc | 1 | 1-2 |
| mno | 2 | 1-2 |
| abc | 1 | 1-1 |
| jkl | 3 | 1-1 |
| abc | 2 | 1-2 |
我最初的尝试把所有母公司分成一行,通过寻找部件使用而成倍,并将其附在主要数据框架。 这种解决办法是不明智的,失去日期信息,并且将所有母公司倒入一条线(我也关心某个项目的增长)。
library(tidyverse)
#create data frame with Parent_SKU usage
df4 <- df1 %>% filter(SKU %in% unique(df2$Parent_SKU)) %>% group_by(SKU) %>% summarize(Qty = sum(Qty))
#Rename column for joining
df4 <- df4 %>% rename("Parent_SKU"="SKU")
#Create new df with Parent_SKU Qty associated with components
df5 <- full_join(df4,df2)
#Turn Qty column into qty of component use
df5$Qty <- df5$Qty*df5$Component_Qty
#Rename component column for joining
df5 <- df5 %>% rename("SKU"="Component")
#Append component usage together with original data frame & get rid of non useful columns
df3 <- dplyr::bind_rows(df5, df1)
df3 <- df3[-c(1,4)]
结果:
| SKU | Qty | Date |
| -------- | -------- | -------- |
| abc | 2 | 1-1 |
| def | 1 | 1-1 |
| ghi | 1 | 1-2 |
| abc | 1 | 1-2 |
| mno | 2 | 1-2 |
| abc | 1 | NA |
| jkl | 3 | NA |
| abc | 2 | NA |