English 中文(简体)
优化复杂的数据。
原标题:Optimising a complex data.table aggregation in R

查阅。 • 如何在大型区域数据框架中优化每个行的过滤和计算。

页: 1 例如:

  name day wages hour colour
1  Ann   1   100    6  Green
2  Ann   1   150   18   Blue
3  Ann   2   200   10   Blue
4  Ann   3   150   10  Green
5  Bob   1   100   11    Red
6  Bob   1   200   17    Red
7  Bob   1   150   20  Green
8  Bob   2   100   11    Red

我愿就每个独特的姓名/日,每四个时间段中的一个时间段了解若干次facts。 我所关心的时期是:

t1 (hour < 9) 
t2 (hour < 17) 
t3 (hour > 9) 
t4 (hour > 17)

Some examples of facts might be:

wages > 175
colour = "Green"

我可以通过以下<代码>数据来做到这一点。 过滤器

setkey(dt,name,day)
result <- dt[,list(wages.t1=sum(wages>175&hour<9),
     wages.t2=sum(wages>175&hour<17),
     wages.t3=sum(wages>175&hour>9),
     wages.t4=sum(wages>175&hour>17),
     green.t1=sum(colour=="Green"&hour<9),
     green.t2=sum(colour=="Green"&hour<17),
     green.t3=sum(colour=="Green"&hour>9),
     green.t4=sum(colour=="Green"&hour>17)),

名单(姓名:日)

让我向我转达我的发言。

     name day wages.t1 wages.t2 wages.t3 wages.t4 green.t1 green.t2 green.t3 green.t4
[1,]  Ann   1        0        0        0        0        1        1        0        0
[2,]  Ann   2        0        1        1        0        0        0        0        0
[3,]  Ann   3        0        0        0        0        0        1        1        0
[4,]  Bob   1        0        0        1        0        0        0        1        1
[5,]  Bob   2        0        0        0        0        0        0        0        0

但(a) 不能阅读和阅读;书写和(b) 种子效率低下。

在我如何能够做得更好方面,任何ti? 请注意,在我的实际情况中,我有数以百计的千分之数、四个时期和30至35个时期的事实。

-- Code to create dt

dt = data.table(
  name = factor(c("Ann", "Ann", "Ann", "Ann", 
                  "Bob", "Bob", "Bob", "Bob")), 
  day = c(1, 1, 2, 3, 1, 1, 1, 2), 
  wages = c(100, 150, 200, 150, 100, 200, 150, 100), 
  hour = c(6, 18, 10, 10, 11, 17, 20, 11), 
  colour = c("Green", "Blue", "Blue", "Green", "Red",
             "Red", "Green", "Red")
)
最佳回答

如何对待诸如:

f = list(quote(wages>175),quote(colour=="Green"))
t = list(quote(hour<9),quote(hour<17),quote(hour>9),quote(hour>17))
dt = as.data.table(df)
dt[,as.list(mapply("%*%",
            lapply(t,eval,.SD),
            rep(lapply(f,eval,.SD),each=length(t))
           )), by=list(name,day)]
     name day V1 V2 V3 V4 V5 V6 V7 V8
[1,]  Ann   1  0  0  0  0  1  1  0  0
[2,]  Ann   2  0  1  1  0  0  0  0  0
[3,]  Ann   3  0  0  0  0  0  1  1  0
[4,]  Bob   1  0  0  1  0  0  0  1  1
[5,]  Bob   2  0  0  0  0  0  0  0  0

很显然,一栏名称没有处理,但如果这种做法不成熟,可以增加。

这一点应当更加有效,因为每个组对<代码>t和每个f进行评价一次,然后合并这些结果。

问题回答

暂无回答




相关问题
How to plot fitted model over observed time series

This is a really really simple question to which I seem to be entirely unable to get a solution. I would like to do a scatter plot of an observed time series in R, and over this I want to plot the ...

REvolution for R

since the latest Ubuntu release (karmic koala), I noticed that the internal R package advertises on start-up the REvolution package. It seems to be a library collection for high-performance matrix ...

R - capturing elements of R output into text files

I am trying to run an analysis by invoking R through the command line as follows: R --no-save < SampleProgram.R > SampleProgram.opt For example, consider the simple R program below: mydata =...

R statistical package: wrapping GOFrame objects

I m trying to generate GOFrame objects to generate a gene ontology mapping in R for unsupported organisms (see http://www.bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/...

Changing the order of dodged bars in ggplot2 barplot

I have a dataframe df.all and I m plotting it in a bar plot with ggplot2 using the code below. I d like to make it so that the order of the dodged bars is flipped. That is, so that the bars labeled "...

Strange error when using sparse matrices and glmnet

I m getting a weird error when training a glmnet regression. invalid class "dgCMatrix" object: length(Dimnames[[2]]) must match Dim[2] It only happens occasionally, and perhaps only under larger ...

Generating non-duplicate combination pairs in R

Sorry for the non-descriptive title but I don t know whether there s a word for what I m trying to achieve. Let s assume that I have a list of names of different classes like c( 1 , 2 , 3 , 4 ) ...

Per panel smoothing in ggplot2

I m plotting a group of curves, using facet in ggplot2. I d like to have a smoother applied to plots where there are enough points to smooth, but not on plots with very few points. In particular I d ...

热门标签