我有两个数据框架:
value <- structure(list(rf1 = c(40000L, 9680000L, 9680000L), rt1 = c(80000L,
9720000L, 9720000L), rf2 = c(9400000L, 44480000L, 80000L), rt2 = c(9440000,
444520000, 120000)), row.names = c(235L, 1112L, 2L), class = "data.frame")
regions <- structure(list(V2 = c(8522360L, 8591167L, 8791167L, 43059890L,
43099890L), V3 = c(8551167L, 8631167L, 8831167L, 43099890L, 43139890L
), start = c("9400000", "9480000", "9680000", "44480000", "44520000"
), end = c("9440000", "9520000", "9720000", "44520000", "44560000"
)), class = "data.frame", row.names = c(1L, 3L, 5L, 782L, 783L
))
I want to match the first dataframe with second dataframe. Basically if start and end matched with rf2,rt2(together) then rf2 and rt2 gets the value V2 and V3. Simultaneously, if rf1 and rt1 (together) matched both start and end then rf1 and rt1 get V2 and V3 values. If not, then delete the row, so if (rf1 and rt1) doesn t match delete the row, if (rf2 and rt2) doesn t match delete the row.
因此:
first dataset:
rf1 rt1 rf2 rt2
40000 80000 9400000 9440000
9680000 9720000 44480000 444520000
9680000 9720000 80000 120000
Second dataset:
V2 V3 start end
522360 8551167 9400000 9440000
8791167 8831167 9680000 9720000
43059890 43099890 44480000 44520000
Selected rows:
9680000 9720000 44480000 444520000
It will have only 2nd row since in 1st row (rf1 and rt1) doesn t match even though rf2 and rt2 match with start and end For 3rd row: will be deleted since (rf1 and rt1) exact match with start and end but (rf2 and rt2) doesn t exactly match with start and end. For row 2nd: both (rf1 and rt1) and (rf2 and rt2) exact match with start and end. Once this row matched: now reassign the values:
Final output:
8791167 8831167 43059890 43099890
#match rf2 and rt2 region with region file: if notcomplat.
for(i in 1:dim(value)[1]){
value1 <- region %>% filter((as.numeric(region$start)-1) %in% value$rf2[i],(as.numeric(region$end) %in% value$rt2[i] ))
if(dim(value1)[1]>=1){
##now assign the value:
value$rf2[i] <- value1$V2
value$rt2[i] <- value1$V3
}
else{
value <- value[-c(i),]
}
}
##match rf1 and rt1 with region file: if not match remove the row:
for(i in 1:dim(value)[1]){
value1 <- region %>% filter((as.numeric(region$start)-1) %in% value$rf1[i],(as.numeric(region$end) %in% value$rt1[i] ))
if(nrow(value1)>0){
##now assign the value:
value$rf2[i] <- value1$V2
value$rt2[i] <- value1$V3
}
else{
value <- value[-c(i),]
}
}
In my case: certain values are getting matched but the unmatched values still remains. Is there any other way to solve this. I will appreciate guidance on how to solve this problem.