Question

我有两个数据框架:

value <- structure(list(rf1 = c(40000L, 9680000L, 9680000L), rt1 = c(80000L, 
9720000L, 9720000L), rf2 = c(9400000L, 44480000L, 80000L), rt2 = c(9440000, 
444520000, 120000)), row.names = c(235L, 1112L, 2L), class = "data.frame")

regions <- structure(list(V2 = c(8522360L, 8591167L, 8791167L, 43059890L, 
43099890L), V3 = c(8551167L, 8631167L, 8831167L, 43099890L, 43139890L
), start = c("9400000", "9480000", "9680000", "44480000", "44520000"
), end = c("9440000", "9520000", "9720000", "44520000", "44560000"
)), class = "data.frame", row.names = c(1L, 3L, 5L, 782L, 783L
))

I want to match the first dataframe with second dataframe. Basically if start and end matched with rf2,rt2(together) then rf2 and rt2 gets the value V2 and V3. Simultaneously, if rf1 and rt1 (together) matched both start and end then rf1 and rt1 get V2 and V3 values. If not, then delete the row, so if (rf1 and rt1) doesn t match delete the row, if (rf2 and rt2) doesn t match delete the row.

因此:

first dataset: 
rf1 rt1 rf2 rt2 
40000 80000 9400000 9440000
9680000 9720000 44480000 444520000
9680000 9720000 80000 120000

Second dataset: 
V2 V3 start end
522360 8551167 9400000 9440000
8791167 8831167 9680000 9720000
43059890 43099890 44480000 44520000

Selected rows:
9680000 9720000 44480000 444520000

It will have only 2nd row since in 1st row (rf1 and rt1) doesn t match even though rf2 and rt2 match with start and end For 3rd row: will be deleted since (rf1 and rt1) exact match with start and end but (rf2 and rt2) doesn t exactly match with start and end. For row 2nd: both (rf1 and rt1) and (rf2 and rt2) exact match with start and end. Once this row matched: now reassign the values:

Final output: 
8791167 8831167 43059890 43099890

#match rf2 and rt2 region with region file: if notcomplat.

for(i in 1:dim(value)[1]){
  value1 <- region %>% filter((as.numeric(region$start)-1) %in% value$rf2[i],(as.numeric(region$end) %in% value$rt2[i] ))
  if(dim(value1)[1]>=1){
    ##now assign the value:
    value$rf2[i] <- value1$V2
    value$rt2[i] <- value1$V3
  }
    else{
      value <- value[-c(i),]
    }
  }
  


##match rf1 and rt1 with region file: if not match remove the row:

for(i in 1:dim(value)[1]){
  value1 <- region %>% filter((as.numeric(region$start)-1) %in% value$rf1[i],(as.numeric(region$end) %in% value$rt1[i] ))
  if(nrow(value1)>0){
    ##now assign the value:
    value$rf2[i] <- value1$V2
    value$rt2[i] <- value1$V3
  }
    else{
      value <- value[-c(i),]
    }
  }

In my case: certain values are getting matched but the unmatched values still remains. Is there any other way to solve this. I will appreciate guidance on how to solve this problem.

Answer 1

这似乎基于你再次想要这样做的描述:

# the start and end columns need to be numeric
regions <- mutate(regions, across(c(start, end), as.numeric))
    
bind_rows(inner_join(regions, value, by = c("start" = "rf1", "end" = "rt2")),
          inner_join(regions, value, by = c("start" = "rf2", "end" = "rt2"))) %>%
        mutate(rf2 = V2, 
               rt2 = V3)

       V2      V3   start     end   rt1     rf2   rf1     rt2
1 8522360 8551167 9400000 9440000 80000 8522360 40000 8551167

关于加入R的更多信息,见。关于加入数据科学教科书的R章

友情链接