English 中文(简体)
如何提高这一机会的效率?
原标题:How to make this loop more efficient?
  • 时间:2011-11-24 12:27:57
  •  标签:
  • r

我有这样的数据框架:

user1,product1,0
user1,product2,2
user1,product3,1
user1,product4,2
user2,product3,0
user2,product2,2
user3,product4,0
user3,product5,3

The data frame has millions of rows. I need to go through each row, and if the value in the last column is 0, then keep that product number, otherwise attach the product number to the previous product number that has value = 0, then write to a new data frame.

例如,由此形成的矩阵应当

user1,product1
user1,product1product2
user1,product1product3
user1,product1product4
user2,product3
user2,product3product2
user3,product4
user3,product4product5

I wrote a for loop to go through each row, and it works, but is very very slow. How can I speed it up? I tried to vectorize it, but I m not sure how because I need to check the value of previous row.

最佳回答

请注意,您实际上没有matrix/em>。 矩阵只能包含一种原子类型(数字、分类、特性等)。 确实有数据。

What you want to do is easily done with na.locf from the zoo package and the ifelse function.

x <- structure(list(V1 = c("user1", "user1", "user1", "user1", "user2", 
"user2", "user3", "user3"), V2 = c("product1", "product2", "product3", 
"product4", "product3", "product2", "product4", "product5"), 
    V3 = c("0", "2", "1", "2", "0", "2", "0", "3")), .Names = c("V1", 
"V2", "V3"), class = "data.frame", row.names = c(NA, 8L))

library(zoo)
# First, create a column that contains the value from the 2nd column
# when the 3rd column is zero.
x$V4 <- ifelse(x$V3==0,x$V2,NA)
# Next, replace all the NA with the previous non-NA value
x$V4 <- na.locf(x$V4)
# Finally, create a column that contains the concatenated strings
x$V5 <- ifelse(x$V2==x$V4,x$V2,paste(x$V4,x$V2,sep=""))
# Desired output
x[,c(1,5)]

由于你重新使用数据基,你需要确保“产品”一栏是特性而不是因素(如果“产品”一栏是因素,上述代码将产生奇数结果)。

问题回答

暂无回答




相关问题
How to plot fitted model over observed time series

This is a really really simple question to which I seem to be entirely unable to get a solution. I would like to do a scatter plot of an observed time series in R, and over this I want to plot the ...

REvolution for R

since the latest Ubuntu release (karmic koala), I noticed that the internal R package advertises on start-up the REvolution package. It seems to be a library collection for high-performance matrix ...

R - capturing elements of R output into text files

I am trying to run an analysis by invoking R through the command line as follows: R --no-save < SampleProgram.R > SampleProgram.opt For example, consider the simple R program below: mydata =...

R statistical package: wrapping GOFrame objects

I m trying to generate GOFrame objects to generate a gene ontology mapping in R for unsupported organisms (see http://www.bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/...

Changing the order of dodged bars in ggplot2 barplot

I have a dataframe df.all and I m plotting it in a bar plot with ggplot2 using the code below. I d like to make it so that the order of the dodged bars is flipped. That is, so that the bars labeled "...

Strange error when using sparse matrices and glmnet

I m getting a weird error when training a glmnet regression. invalid class "dgCMatrix" object: length(Dimnames[[2]]) must match Dim[2] It only happens occasionally, and perhaps only under larger ...

Generating non-duplicate combination pairs in R

Sorry for the non-descriptive title but I don t know whether there s a word for what I m trying to achieve. Let s assume that I have a list of names of different classes like c( 1 , 2 , 3 , 4 ) ...

Per panel smoothing in ggplot2

I m plotting a group of curves, using facet in ggplot2. I d like to have a smoother applied to plots where there are enough points to smooth, but not on plots with very few points. In particular I d ...