English 中文(简体)
[,x:=f(y),by=z]?
原标题:data.table efficient alternative to grouped assignment as DT[ ,x:=f(y),by=z]?
  • 时间:2012-05-24 00:51:47
  •  标签:
  • r
  • data.table

我正在寻找尚未执行的(据我所知)任务的最佳替代方法,例如,在按组分列的数据表中,参照尚未执行的(据我所知)任务。

DT = data.table(x=rep(c("a","b","c"),each=3), y=c(1,3,6), v=1:9)
     x y v
[1,] a 1 1
[2,] a 3 2
[3,] a 6 3
[4,] b 1 4
[5,] b 3 5
[6,] b 6 6
[7,] c 1 7
[8,] c 3 8
[9,] c 6 9

我想添加一个新的列z, 包括 f(y, v) 的 f(y, v), 以 x( lets take f(y, v) = mean(y)+v) 的值分组。 请注意, 我不想打印或存储此计算的结果, 如

DT[,mean(y)+v,by=x]
      x        V1
 [1,] a  4.333333
 [2,] a  5.333333
 [3,] a  6.333333
 [4,] b  7.333333
 [5,] b  8.333333
 [6,] b  9.333333
 [7,] c 10.333333
 [8,] c 11.333333
 [9,] c 12.333333

但我更想把结果加到DT上:

     x y v        V1
[1,] a 1 1  4.333333
[2,] a 3 2  5.333333
[3,] a 6 3  6.333333
[4,] b 1 4  7.333333
[5,] b 3 5  8.333333
[6,] b 6 6  9.333333
[7,] c 1 7 10.333333
[8,] c 3 8 11.333333
[9,] c 6 9 12.333333

表格有262 MB MB, 因此

DT <- DT[,transform(.SD,mean(y)+v),by=x]

这不是一个选项,因为我无法在记忆中适应DT两次(我认为这是抄写操作所暗示的 ) 。 事实上,我从未见过这个操作完成。

我还有什么其他选择(直到数据。表格中含有DT[,z:= mean(y)+v,by=x])?

我刚读到关于DT[新DT]的报导,这里怎么了?

newDT <- DT[,mean(y)+v,by=x]
      x        V1
 [1,] a  4.333333
 [2,] a  5.333333
 [3,] a  6.333333
 [4,] b  7.333333
 [5,] b  8.333333
 [6,] b  9.333333
 [7,] c 10.333333
 [8,] c 11.333333
 [9,] c 12.333333

(这是明智的)说:

> DT[newDT]
setkey(DT,x)
setkey(newDT,x)
x y v        V1
a 1 1  4.333333
a 3 2  4.333333
a 6 3  4.333333
a 1 1  5.333333
a 3 2  5.333333
a 6 3  5.333333
a 1 1  6.333333
a 3 2  6.333333
a 6 3  6.333333
b 1 4  7.333333
b 3 5  7.333333
b 6 6  7.333333
b 1 4  8.333333
b 3 5  8.333333
b 6 6  8.333333
b 1 4  9.333333
b 3 5  9.333333
b 6 6  9.333333
c 1 7 10.333333
c 3 8 10.333333
c 6 9 10.333333
c 1 7 11.333333
c 3 8 11.333333
c 6 9 11.333333
c 1 7 12.333333
c 3 8 12.333333
c 6 9 12.333333

但这不是我想要的,这是什么错?

问题回答
DT[, xm := ave(y, x, FUN=mean) + v]

我将采取以下行动:

DT[, list(fvy = mean(y)), by="x"][DT][, fvy := fvy + v]

所以基本上,我把它分成两部分:首先,我计算了 y 的平均值,然后在 DT 中添加了该值,然后我又在 y 的平均值中添加了 v 。 记忆方面我不确定这是否真正有用, 但作者很有可能看到并告诉我们;-)

关于为什么它不起作用的问题:基本上,你最后有两个要合并的数据.表格: DT newDT 。两个数据.table都有每个关键字三次。所以很明显,当你合并它们时,每个组合都是结果,这就是为什么你得到一个数据。表9a、b和c s。

所以,为了做你的方式 与我的非常相似 你需要第二把钥匙:

newDT <- DT[,list(fvy=mean(y)+v, v),by=x]
setkey(newDT, x, v)
setkey(DT, x, v)
DT[newDT]
      x v y       fvy
 [1,] a 1 1  4.333333
 [2,] a 2 3  5.333333
 [3,] a 3 6  6.333333
 [4,] b 4 1  7.333333
 [5,] b 5 3  8.333333
 [6,] b 6 6  9.333333
 [7,] c 7 1 10.333333
 [8,] c 8 3 11.333333
 [9,] c 9 6 12.333333




相关问题
How to plot fitted model over observed time series

This is a really really simple question to which I seem to be entirely unable to get a solution. I would like to do a scatter plot of an observed time series in R, and over this I want to plot the ...

REvolution for R

since the latest Ubuntu release (karmic koala), I noticed that the internal R package advertises on start-up the REvolution package. It seems to be a library collection for high-performance matrix ...

R - capturing elements of R output into text files

I am trying to run an analysis by invoking R through the command line as follows: R --no-save < SampleProgram.R > SampleProgram.opt For example, consider the simple R program below: mydata =...

R statistical package: wrapping GOFrame objects

I m trying to generate GOFrame objects to generate a gene ontology mapping in R for unsupported organisms (see http://www.bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/...

Changing the order of dodged bars in ggplot2 barplot

I have a dataframe df.all and I m plotting it in a bar plot with ggplot2 using the code below. I d like to make it so that the order of the dodged bars is flipped. That is, so that the bars labeled "...

Strange error when using sparse matrices and glmnet

I m getting a weird error when training a glmnet regression. invalid class "dgCMatrix" object: length(Dimnames[[2]]) must match Dim[2] It only happens occasionally, and perhaps only under larger ...

Generating non-duplicate combination pairs in R

Sorry for the non-descriptive title but I don t know whether there s a word for what I m trying to achieve. Let s assume that I have a list of names of different classes like c( 1 , 2 , 3 , 4 ) ...

Per panel smoothing in ggplot2

I m plotting a group of curves, using facet in ggplot2. I d like to have a smoother applied to plots where there are enough points to smooth, but not on plots with very few points. In particular I d ...

热门标签