English 中文(简体)
R中的数据集比较
原标题:Compare datasets in R

我在一份CSV格式档案中收集了一套交易:

{Pierre, lait, oeuf, beurre, pain}
{Paul, mange du pain,jambon, lait}
{Jacques, oeuf, va chez la crémière, pain, voiture}

我计划进行简单的协会规则分析,但首先,我想将不属于<代码>的每项交易中的项目排除在外。 ReferenceSet ={lait, oeuf, beurre,疼痛}。

因此,我所得出的数据集就是一个例子:

{Pierre, lait, oeuf, beurre, pain}
{Paul,lait}
{Jacques, oeuf, pain,}

我相信,这非常简单,但会爱读建议/建议,帮助我。

问题回答

另一种回答性参考资料%in%,但在此情形下,intersection甚至为交代(请看match,但我认为其文件与%in%相同。 我们可以把答案变成一线:

数据:

> L <- list(pierre=c("lait","oeuf","beurre","pain") ,
+           paul=c("mange du pain", "jambon", "lait"),
+           jacques=c("oeuf","va chez la crémière", "pain", "voiture"))
> reference <- c("lait", "oeuf", "beurre", "pain")

答复:

> lapply(L,intersect,reference)
$pierre
[1] "lait"   "oeuf"   "beurre" "pain"  

$paul
[1] "lait"

$jacques
[1] "oeuf" "pain"

一种方式是(但是,由于Im 离开结构作为矩阵I 离开了数据被删除的NAs(如果再出口到CSV,就可以去掉这些数据);我也相信,在没有休息的情况下可以这样做——这样会使其更快(但是,IMHO不易读);我相信,有更有效的方法去做逻辑——我也有兴趣看到其他人对此的看法。

ref <- c("lait","oeuf","beurre","pain")
input <- read.csv("info.csv",sep=",",header=FALSE,strip.white=TRUE)

> input
   V1            V2                  V3     V4      V5
1  Pierre          lait                oeuf beurre    pain
2    Paul mange du pain              jambon   lait        
3 Jacques          oeuf va chez la crémière   pain voiture

input <- as.matrix(input)
output <- matrix(nrow=nrow(input),ncol=ncol(input))
currentRow <- c()

for(i in 1:nrow(input)) {
  j <- 2
  output[i,1]<-input[i,1]
  for(k in 2:length(input[i,])) {
    if(toString(input[i,k]) %in% ref){
      output[i,j] <- toString(input[i,k])
      j<-j+1
    }
  }
}

> output
     [,1]      [,2]   [,3]   [,4]     [,5]  
[1,] "Pierre"  "lait" "oeuf" "beurre" "pain"
[2,] "Paul"    "lait" NA     NA       NA    
[3,] "Jacques" "oeuf" "pain" NA       NA    

<>%>>的操作者将戴上手。

pierre <- c("lait","oeuf","beurre","pain")  
paul <- c("mange du pain", "jambon", "lait")  
jacques <- c("oeuf","va chez la crémière", "pain", "voiture")

reference <- c("lait", "oeuf", "beurre", "pain")

pierre_fixed <- pierre[pierre %in% reference]
paul_fixed <- paul[paul %in% reference]
jacques_fixed <- jacques[jacques %in% reference]  

pierre_fixed 
paul_fixed
jacques_fixed




相关问题
How to plot fitted model over observed time series

This is a really really simple question to which I seem to be entirely unable to get a solution. I would like to do a scatter plot of an observed time series in R, and over this I want to plot the ...

REvolution for R

since the latest Ubuntu release (karmic koala), I noticed that the internal R package advertises on start-up the REvolution package. It seems to be a library collection for high-performance matrix ...

R - capturing elements of R output into text files

I am trying to run an analysis by invoking R through the command line as follows: R --no-save < SampleProgram.R > SampleProgram.opt For example, consider the simple R program below: mydata =...

R statistical package: wrapping GOFrame objects

I m trying to generate GOFrame objects to generate a gene ontology mapping in R for unsupported organisms (see http://www.bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/...

Changing the order of dodged bars in ggplot2 barplot

I have a dataframe df.all and I m plotting it in a bar plot with ggplot2 using the code below. I d like to make it so that the order of the dodged bars is flipped. That is, so that the bars labeled "...

Strange error when using sparse matrices and glmnet

I m getting a weird error when training a glmnet regression. invalid class "dgCMatrix" object: length(Dimnames[[2]]) must match Dim[2] It only happens occasionally, and perhaps only under larger ...

Generating non-duplicate combination pairs in R

Sorry for the non-descriptive title but I don t know whether there s a word for what I m trying to achieve. Let s assume that I have a list of names of different classes like c( 1 , 2 , 3 , 4 ) ...

Per panel smoothing in ggplot2

I m plotting a group of curves, using facet in ggplot2. I d like to have a smoother applied to plots where there are enough points to smooth, but not on plots with very few points. In particular I d ...

热门标签