English 中文(简体)
基于 R 函数的附加数据框架
原标题:Appending data frames based on a function in R

How do I append data frames one after the other to form another data frame? Whether a data frame would be included or not will be decided by a criteria.

以下是一个例子数据:

d1 <- data.frame(MyGroups =sample(LETTERS,100,replace=TRUE),
                 MyInt = sample(c(1:20),100,replace=TRUE))

现在,我该如何从 My Groups 中选择有超过10个变数 MyInt 平均值的组( A, B, C...)?

我尝试了以下的尝试,但没有成功。在这里,我正在根据给定的标准将数据框架附加到一个文件中。

require("plyr")

keepGrp <- function(df0) { 
  if(max(df0$MyInt < 10)) {df0 <- NULL}
  write.csv(df0, mytable.txt ,append=TRUE,sep= , )
}

ddply(d1,.(MyInt),function(x) keepGrp(x))

The desired data frame should be in file mytable.txt I am fully sure there is a better way to do what I am trying to do. I would be happy to clarify my question if I need to do so. I will appreciate of someone can (1) show me a feedback on improving my programming thoughts (2) give me a solution to my problem.

最佳回答

如果我正确理解你的问题, 您想要按组来计算平均值, 并且只将达到某个阈值的值写入已有文件 。 如果是的话, 为什么不一次计算所有手段, 子集, 然后写出来? 这里有一个班轮, 可能会分成多个, 但我认为你会得到一点 :

write.table(
  subset(
    ddply(d1, "MyGroups", transform, meanval = mean(MyInt)
          ), 
    meanval > 10), 
  "yourcsv.csv", append = TRUE, sep = ",", col.names = FALSE
  )
问题回答

它比您正在做的简单。 ddply 调用的函数既可以返回符合标准的数据子集,也可以返回空数据。

keepGrp <- function(df0) {
  if(mean(df0$MyInt) > 10) {
    df0
  } else {
    data.frame()
  }
}

res <- ddply(d1, .(MyGroups), keepGrp)

请注意,您在 keepGrp 中的测试错误(没有测试 MyInt 值的平均值),而 dply 的分组错误(应该是 MyGroups ,而不是 MyInt )。

检查是否正确 :

> ddply(d1, .(MyGroups), summarise, ave = mean(MyInt))
   MyGroups       ave
1         A 14.200000
2         B  9.600000
3         C  5.600000
4         D  5.600000
5         E  8.000000
6         F 10.500000
7         G  7.333333
8         H 12.000000
9         I  7.333333
10        J  9.500000
11        K 11.000000
12        L 12.375000
13        M 13.250000
14        N 12.000000
15        O 11.666667
16        P  8.625000
17        Q 13.000000
18        R  6.000000
19        S 16.000000
20        T 12.000000
21        U 12.000000
22        V 13.250000
23        W 17.666667
24        X  9.000000
25        Y 12.400000
26        Z 13.750000
> unique(res$MyGroup)
 [1] A F H K L M N O Q S T U V W Y Z
Levels: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

因此,在 res 中出现的是那些对 MyInt 具有适当平均值的那些。





相关问题
How to plot fitted model over observed time series

This is a really really simple question to which I seem to be entirely unable to get a solution. I would like to do a scatter plot of an observed time series in R, and over this I want to plot the ...

REvolution for R

since the latest Ubuntu release (karmic koala), I noticed that the internal R package advertises on start-up the REvolution package. It seems to be a library collection for high-performance matrix ...

R - capturing elements of R output into text files

I am trying to run an analysis by invoking R through the command line as follows: R --no-save < SampleProgram.R > SampleProgram.opt For example, consider the simple R program below: mydata =...

R statistical package: wrapping GOFrame objects

I m trying to generate GOFrame objects to generate a gene ontology mapping in R for unsupported organisms (see http://www.bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/...

Changing the order of dodged bars in ggplot2 barplot

I have a dataframe df.all and I m plotting it in a bar plot with ggplot2 using the code below. I d like to make it so that the order of the dodged bars is flipped. That is, so that the bars labeled "...

Strange error when using sparse matrices and glmnet

I m getting a weird error when training a glmnet regression. invalid class "dgCMatrix" object: length(Dimnames[[2]]) must match Dim[2] It only happens occasionally, and perhaps only under larger ...

Generating non-duplicate combination pairs in R

Sorry for the non-descriptive title but I don t know whether there s a word for what I m trying to achieve. Let s assume that I have a list of names of different classes like c( 1 , 2 , 3 , 4 ) ...

Per panel smoothing in ggplot2

I m plotting a group of curves, using facet in ggplot2. I d like to have a smoother applied to plots where there are enough points to smooth, but not on plots with very few points. In particular I d ...

热门标签