English 中文(简体)
我如何通过两个数据框架的栏目清单和每个样本的产出匹配/不匹配,为R至Cerate设定一个排位?
原标题:How can I create a for loop in R to iterate through a list of column names across two data frames and output match/no match for each sample?

我有两套数据,有相同的栏目,我正试图根据抽样识别对各栏的数据进行比较。 例如:

  df1 <- data.frame(sample_ID = c( animal1 ,  animal2 ,  animal3 ,  animal4 ,  animal5 ),
                  loci1 = c( T,T ,  A,T ,  C,T ,  T,T ,  T,G ),
                  loci2 = c( G,T ,  T,T ,  A,T ,  T,T ,  T,A ))
  df2 <- data.frame(sample_ID = c( animal1 ,  animal2 ,  animal3 ,  animal4 ,  animal5 ),
                  loci1 = c( T,T ,  A,T ,  C,T ,  A,A ,  C,G ),
                  loci2 = c( T,T ,  T,A ,  A,T ,  T,G ,  T,A ))

df1 美元与动物1 的 d2 美元相同,因此,我想制定一部能够检查的法典,因为我有200个动物和70多个动物。 理想的做法是建立一个新栏,每栏注明“事项”或“不匹配”。

一开始是加入两个数据框架,然后使用对等,建立一个新的一栏,如果对两个数据框架进行对比,即产出:

   df3 <- df1 %>%
    inner_join(df2, by =  sample_ID ) %>%
      mutate(match_loci1 = c( no_match ,  match )[1 + (loci1.x == loci1.y)])

这对一只 lo子来说是好的,但是,由于我有极少数人要通过我使用内装_后被称作 lo子和 lo子的不同的 lo,帮助形成一种 lo,并且为每个人建立一个新的一栏,即“mat子”、“mat子”等。

我急切地提出所有诉讼清单,并着手办理诉讼:

loci_names <- colnames(df1)

  test2 <- df1 %>% 
  inner_join(df2, by =  sample_ID ) %>%
  for (i in loci_list) {
    mutate(match$[[i]] = c( no_match ,  match )[1 + [[i]]$.x == [[i]]$.y])
  }

but I get this error:

我不敢确定如何安排交错行动,以便通过每一条圈子进行。

问题回答

我愿引用你的数据,然后加入:

library(dplyr)
library(tidyr)  
df1 |> pivot_longer(-sample_ID, names_to = "loci_i", values_to = "df1_value") |>
  full_join(
    df2 |> pivot_longer(-sample_ID, names_to = "loci_i", values_to = "df2_value"),
    by = c("sample_ID", "loci_i")
  ) |>
  mutate(is_match = df1_value == df2_value)
# # A tibble: 10 × 5
#    sample_ID loci_i df1_value df2_value is_match
#    <chr>     <chr>  <chr>     <chr>     <lgl>   
#  1 animal1   loci1  T,T       T,T       TRUE    
#  2 animal1   loci2  G,T       T,T       FALSE   
#  3 animal2   loci1  A,T       A,T       TRUE    
#  4 animal2   loci2  T,T       T,A       FALSE   
#  5 animal3   loci1  C,T       C,T       TRUE    
#  6 animal3   loci2  A,T       A,T       TRUE    
#  7 animal4   loci1  T,T       A,A       FALSE   
#  8 animal4   loci2  T,T       T,G       FALSE   
#  9 animal5   loci1  T,G       C,G       FALSE   
# 10 animal5   loci2  T,A       T,A       TRUE    




相关问题
How to plot fitted model over observed time series

This is a really really simple question to which I seem to be entirely unable to get a solution. I would like to do a scatter plot of an observed time series in R, and over this I want to plot the ...

REvolution for R

since the latest Ubuntu release (karmic koala), I noticed that the internal R package advertises on start-up the REvolution package. It seems to be a library collection for high-performance matrix ...

R - capturing elements of R output into text files

I am trying to run an analysis by invoking R through the command line as follows: R --no-save < SampleProgram.R > SampleProgram.opt For example, consider the simple R program below: mydata =...

R statistical package: wrapping GOFrame objects

I m trying to generate GOFrame objects to generate a gene ontology mapping in R for unsupported organisms (see http://www.bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/...

Changing the order of dodged bars in ggplot2 barplot

I have a dataframe df.all and I m plotting it in a bar plot with ggplot2 using the code below. I d like to make it so that the order of the dodged bars is flipped. That is, so that the bars labeled "...

Strange error when using sparse matrices and glmnet

I m getting a weird error when training a glmnet regression. invalid class "dgCMatrix" object: length(Dimnames[[2]]) must match Dim[2] It only happens occasionally, and perhaps only under larger ...

Generating non-duplicate combination pairs in R

Sorry for the non-descriptive title but I don t know whether there s a word for what I m trying to achieve. Let s assume that I have a list of names of different classes like c( 1 , 2 , 3 , 4 ) ...

Per panel smoothing in ggplot2

I m plotting a group of curves, using facet in ggplot2. I d like to have a smoother applied to plots where there are enough points to smooth, but not on plots with very few points. In particular I d ...

热门标签