English 中文(简体)
我如何使用可变名称来指数据框架栏目,而数据栏则充份?
原标题:How can I use variable names to refer to data frame columns with ddply?
  • 时间:2012-01-15 10:30:28
  •  标签:
  • r
  • plyr

我正试图写出一种功能,这种功能以数据框架的名称和数据框架中的一栏的名称为理由。 该功能对这些数据进行了各种操作,其中一栏中每年增加一次总操作量。 我正在使用纸浆。

When I use the name of the column directly with ddply and cumsum I have no problems:

require(plyr)
df <- data.frame(date = seq(as.Date("2007/1/1"),
                     by = "month",
                     length.out = 60),
                 sales = runif(60, min = 700, max = 1200))

df$year <- as.numeric(format(as.Date(df$date), format="%Y"))
df <- ddply(df, .(year), transform,
            cum_sales = (cumsum(as.numeric(sales))))

这完全是好的,但最终目的是能够将一栏名字传给这一职务。 当我试图用一个变量取代一栏名称时,我预计不会做以下工作:

mycol <- "sales"
df[mycol]

df <- ddply(df, .(year), transform,
            cum_value2 = cumsum(as.numeric(df[mycol])))

我认为我知道如何以名字进入一栏。 这令我感到担忧,因为这表明我没有理解关于指数化和采掘的基本内容。 我认为,以这种方式提及各栏是共同的需要。

我有两个问题。

  1. What am I doing wrong i.e. what have I misunderstood?
  2. Is there a better way of going about this, bearing in mind that the names of the columns will not be known beforehand by the function?

TIA

最佳回答

曲解的论点是,根据原始数据框架的每一部分评估的表述被分为几部分。 你的 d夫谈到整个数据框架,因此你不能把它当作事实来通过(过去,为什么你们需要像数字一样(如char子)——他们完全没有用处)。

The easiest way will be to write your own function which will does everything inside and pass the column name down, e.g.

df <- ddply(df, 
            .(year), 
            .fun = function(x, colname) transform(x, cum_sales = cumsum(x[,colname])), 
            colname = "sales")
问题回答

The problem is that ddply expects its last arguments to be expressions, that will be evaluated on chunks of the data.frame (every year, in your example). If you use df[myval], you have the whole data.frame, not the annual chunks.

The following work, but is not very elegant: 我将这一表述作为示意图,然后将其改为eval(parse(......))

ddply( df, .(year), transform, 
  cum_value2 = eval(parse( text = 
    sprintf( "cumsum(as.numeric(as.character(%s)))", mycol )
  ))
)




相关问题
How to plot fitted model over observed time series

This is a really really simple question to which I seem to be entirely unable to get a solution. I would like to do a scatter plot of an observed time series in R, and over this I want to plot the ...

REvolution for R

since the latest Ubuntu release (karmic koala), I noticed that the internal R package advertises on start-up the REvolution package. It seems to be a library collection for high-performance matrix ...

R - capturing elements of R output into text files

I am trying to run an analysis by invoking R through the command line as follows: R --no-save < SampleProgram.R > SampleProgram.opt For example, consider the simple R program below: mydata =...

R statistical package: wrapping GOFrame objects

I m trying to generate GOFrame objects to generate a gene ontology mapping in R for unsupported organisms (see http://www.bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/...

Changing the order of dodged bars in ggplot2 barplot

I have a dataframe df.all and I m plotting it in a bar plot with ggplot2 using the code below. I d like to make it so that the order of the dodged bars is flipped. That is, so that the bars labeled "...

Strange error when using sparse matrices and glmnet

I m getting a weird error when training a glmnet regression. invalid class "dgCMatrix" object: length(Dimnames[[2]]) must match Dim[2] It only happens occasionally, and perhaps only under larger ...

Generating non-duplicate combination pairs in R

Sorry for the non-descriptive title but I don t know whether there s a word for what I m trying to achieve. Let s assume that I have a list of names of different classes like c( 1 , 2 , 3 , 4 ) ...

Per panel smoothing in ggplot2

I m plotting a group of curves, using facet in ggplot2. I d like to have a smoother applied to plots where there are enough points to smooth, but not on plots with very few points. In particular I d ...

热门标签