Question

我正试图写出一种功能,这种功能以数据框架的名称和数据框架中的一栏的名称为理由。该功能对这些数据进行了各种操作,其中一栏中每年增加一次总操作量。我正在使用纸浆。

When I use the name of the column directly with ddply and cumsum I have no problems:

require(plyr)
df <- data.frame(date = seq(as.Date("2007/1/1"),
                     by = "month",
                     length.out = 60),
                 sales = runif(60, min = 700, max = 1200))

df$year <- as.numeric(format(as.Date(df$date), format="%Y"))
df <- ddply(df, .(year), transform,
            cum_sales = (cumsum(as.numeric(sales))))

这完全是好的,但最终目的是能够将一栏名字传给这一职务。当我试图用一个变量取代一栏名称时,我预计不会做以下工作:

mycol <- "sales"
df[mycol]

df <- ddply(df, .(year), transform,
            cum_value2 = cumsum(as.numeric(df[mycol])))

我认为我知道如何以名字进入一栏。这令我感到担忧,因为这表明我没有理解关于指数化和采掘的基本内容。我认为,以这种方式提及各栏是共同的需要。

我有两个问题。

What am I doing wrong i.e. what have I misunderstood?
Is there a better way of going about this, bearing in mind that the names of the columns will not be known beforehand by the function?

TIA

Answer 1

曲解的论点是,根据原始数据框架的每一部分评估的表述被分为几部分。你的 d夫谈到整个数据框架,因此你不能把它当作事实来通过(过去,为什么你们需要像数字一样(如char子)——他们完全没有用处)。

The easiest way will be to write your own function which will does everything inside and pass the column name down, e.g.

df <- ddply(df, 
            .(year), 
            .fun = function(x, colname) transform(x, cum_sales = cumsum(x[,colname])), 
            colname = "sales")

Answer 2

The problem is that ddply expects its last arguments to be expressions, that will be evaluated on chunks of the data.frame (every year, in your example). If you use df[myval], you have the whole data.frame, not the annual chunks.

The following work, but is not very elegant: 我将这一表述作为示意图,然后将其改为eval(parse(......))。

ddply( df, .(year), transform, 
  cum_value2 = eval(parse( text = 
    sprintf( "cumsum(as.numeric(as.character(%s)))", mycol )
  ))
)

友情链接