English 中文(简体)
R 中最近50个条目的平均值
原标题:Mean of 50 most recent entries in R
  • 时间:2012-05-23 15:48:22
  •  标签:
  • r
  • mean

我有一个数据框架,显示日期、项目和值,我要增加一栏,显示其前50个条目(或未有50个条目的NA)的平均数,例如,表格可以是

      data
date     item value  
01/01/01 a    2  
01/01/01 b    1.5  
04/01/01 c    1.7  
05/01/01 a    1.9  
......

并成为其中一部分

date     item value last_50_mean   
........ 
11/09/01 a    1.2   1.1638
12/09/01 b    1.9   1.5843 
12/09/01 a    1.4   1.1621
13/09/01 c    0.9   NA
........

因此在本案中,在2001年9月11日之前的50个条目中a的平均值是1.1638,c在2001年9月13日之前有50个条目,因此返回 NA

我目前正在使用以下函数来做此操作

  data[,  last_50_mean ] <- sapply(1:nrow(data), function(i){
        prevDates <- data[data$date < data$date[i] & data$item == data$item[i], ]
        num       <- nrow(prevGames)
        if(nGames >= 50){
          round(mean(prevDates[(num- 49):num, ]$value), 4)
        }
      }
  )

但我的数据框架很大,而且要花很长时间(事实上,我并不100%肯定它仍然在运行中有效......有人知道这样做的最佳方法吗?)

最佳回答

N 观察的平均值可以从以下第一个值和最后一个值的累积总和和和差差来计算: diff(cumsum(x), lass=N-1) 。您的问题是要将第一个N-1值加起来,所以

meanN <- function(x, N=50)
    ## mean of last N observations, padded in front with NA
{
    x0 <- x[seq_len(length(x) - N + 1)]
    x1 <- (x0 + diff(cumsum(x), lag=N-1)) / N
    c(rep(NA, N - 1), x1)
}

您喜欢为多个组这样做。 对于 data. frame 喜欢的 < code> data 。

df <- data.frame(item=sample(letters[1:3], 1000, TRUE),
                 value=runif(1000, 1, 3),
                 last_50_mean=NA)

其中一个方法就是

split(df$last_50_mean, df$item) <- lapply(split(df$value, df$item), meanN)

例如,导致

> tail(df)
     item    value last_50_mean
995     c 1.191486     2.037707
996     c 2.899214     2.073022
997     c 2.019375     2.054914
998     c 2.737043     2.066389
999     a 1.703752     1.923234
1000    c 1.602442     2.043517

假设您的数据框架是按时间顺序排列的。 一个潜在的问题是当长矢量溢出 cumcsum 时; 人们可以通过将 value 集中到 < value 来解决这个问题, 期望 cumcsum 离零没有太远。 最近的一个问题涉及到的替代物 https://stackoverflow.com/ questions/10645100/apply-a- conference- to- groups- in- a- data- frame- r" code>split< 并放弃最后的N观察。

问题回答

暂无回答




相关问题
How to plot fitted model over observed time series

This is a really really simple question to which I seem to be entirely unable to get a solution. I would like to do a scatter plot of an observed time series in R, and over this I want to plot the ...

REvolution for R

since the latest Ubuntu release (karmic koala), I noticed that the internal R package advertises on start-up the REvolution package. It seems to be a library collection for high-performance matrix ...

R - capturing elements of R output into text files

I am trying to run an analysis by invoking R through the command line as follows: R --no-save < SampleProgram.R > SampleProgram.opt For example, consider the simple R program below: mydata =...

R statistical package: wrapping GOFrame objects

I m trying to generate GOFrame objects to generate a gene ontology mapping in R for unsupported organisms (see http://www.bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/...

Changing the order of dodged bars in ggplot2 barplot

I have a dataframe df.all and I m plotting it in a bar plot with ggplot2 using the code below. I d like to make it so that the order of the dodged bars is flipped. That is, so that the bars labeled "...

Strange error when using sparse matrices and glmnet

I m getting a weird error when training a glmnet regression. invalid class "dgCMatrix" object: length(Dimnames[[2]]) must match Dim[2] It only happens occasionally, and perhaps only under larger ...

Generating non-duplicate combination pairs in R

Sorry for the non-descriptive title but I don t know whether there s a word for what I m trying to achieve. Let s assume that I have a list of names of different classes like c( 1 , 2 , 3 , 4 ) ...

Per panel smoothing in ggplot2

I m plotting a group of curves, using facet in ggplot2. I d like to have a smoother applied to plots where there are enough points to smooth, but not on plots with very few points. In particular I d ...

热门标签