Question

How do I append data frames one after the other to form another data frame? Whether a data frame would be included or not will be decided by a criteria.

以下是一个例子数据:

d1 <- data.frame(MyGroups =sample(LETTERS,100,replace=TRUE),
                 MyInt = sample(c(1:20),100,replace=TRUE))

现在,我该如何从 My Groups 中选择有超过10个变数 MyInt 平均值的组( A, B, C...)?

我尝试了以下的尝试,但没有成功。在这里,我正在根据给定的标准将数据框架附加到一个文件中。

require("plyr")

keepGrp <- function(df0) { 
  if(max(df0$MyInt < 10)) {df0 <- NULL}
  write.csv(df0, mytable.txt ,append=TRUE,sep= , )
}

ddply(d1,.(MyInt),function(x) keepGrp(x))

The desired data frame should be in file mytable.txt I am fully sure there is a better way to do what I am trying to do. I would be happy to clarify my question if I need to do so. I will appreciate of someone can (1) show me a feedback on improving my programming thoughts (2) give me a solution to my problem.

Answer 1

如果我正确理解你的问题, 您想要按组来计算平均值, 并且只将达到某个阈值的值写入已有文件。如果是的话, 为什么不一次计算所有手段, 子集, 然后写出来? 这里有一个班轮, 可能会分成多个, 但我认为你会得到一点 :

write.table(
  subset(
    ddply(d1, "MyGroups", transform, meanval = mean(MyInt)
          ), 
    meanval > 10), 
  "yourcsv.csv", append = TRUE, sep = ",", col.names = FALSE
  )

Answer 2

它比您正在做的简单。 ddply 调用的函数既可以返回符合标准的数据子集,也可以返回空数据。

keepGrp <- function(df0) {
  if(mean(df0$MyInt) > 10) {
    df0
  } else {
    data.frame()
  }
}

res <- ddply(d1, .(MyGroups), keepGrp)

请注意,您在 keepGrp 中的测试错误(没有测试 MyInt 值的平均值),而 dply 的分组错误(应该是 MyGroups ,而不是 MyInt )。

检查是否正确 :

> ddply(d1, .(MyGroups), summarise, ave = mean(MyInt))
   MyGroups       ave
1         A 14.200000
2         B  9.600000
3         C  5.600000
4         D  5.600000
5         E  8.000000
6         F 10.500000
7         G  7.333333
8         H 12.000000
9         I  7.333333
10        J  9.500000
11        K 11.000000
12        L 12.375000
13        M 13.250000
14        N 12.000000
15        O 11.666667
16        P  8.625000
17        Q 13.000000
18        R  6.000000
19        S 16.000000
20        T 12.000000
21        U 12.000000
22        V 13.250000
23        W 17.666667
24        X  9.000000
25        Y 12.400000
26        Z 13.750000
> unique(res$MyGroup)
 [1] A F H K L M N O Q S T U V W Y Z
Levels: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

因此,在 res 中出现的是那些对 MyInt 具有适当平均值的那些。

友情链接