English 中文(简体)
Optimising R function that adds a new column to a data.frame
原标题:
  • 时间:2009-12-29 14:47:19
  •  标签:
  • r
  • dataframe

I have a function that at the moment programmed in a functional model and either want to speed it up and maybe solve the problem more in the spirit of R. I have a data.frame and want to add a column based on information that s where every entry depends on two rows. At the moment it looks like the following:

faultFinging <- function(heartData){
    if(heartData$Pulse[[1]] == 0){
        Group <- 0
    }
    else{
        Group <- 1
    }
    for(i in seq(2, length(heartData$Pulse), 1)){
        if(heartData$Pulse[[i-1]] != 0 
            && heartData$Pulse[[i]] != 0
            && abs(heartData$Pulse[[i-1]] - heartData$Pulse[[i]])<20){
            Group[[i]] <- 1
        }
        else{
            if(heartData$Pulse[[i-1]] == 0 && heartData$Pulse[[i]] != 0){
                Group[[i]] <- 1
            }
            else{
                Group[[i]] <- 0
            }
        }
    }
    Pulse<-heartData$Pulse
    Time<-heartData$Time
    return(data.frame(Time,Pulse,Group))
}
最佳回答

I can t test this without sample data, but this is the general idea. You can avoid doing the for() loop entirely by using & and | which are vectorized versions of && and ||. Also, there s no need for an if-else statement if there s only one value (true or false).

faultFinging <- function(heartData){
    Group <- as.numeric(c(heartData$Pulse[1] != 0,
      (heartData$Pulse[-nrow(heartData)] != 0 
        & heartData$Pulse[-1] != 0
        & abs(heartData$Pulse[-nrow(heartData)] - heartData$Pulse[-1])<20) |
      (heartData$Pulse[-nrow(heartData)] == 0 & heartData$Pulse[-1] != 0)))
    return(cbind(heartData, Group))
}

Putting as.numeric() around the index will set TRUE to 1 and FALSE to 0.

问题回答

This can be done in a more vector way by separating your program into two parts: firstly a function which takes two time samples and determines if they meet your pulse specification:

isPulse <- function(previous, current)
{ 
  (previous != 0 & current !=0 & (abs(previous-current) < 20)) |
  (previous == 0 & current !=0)
}

Note the use of vector | instead of boolean ||.

And then invoke it, supplying the two vector streams previous and current offset by a suitable delay, in your case, 1:

delay <- 1
samples = length(heartData$pulse)

isPulse(heartData$pulse[-(samples-(1:delay))], heartData$pulse[-(1:delay)])

Let s try this on some made-up data:

sampleData = c(1,0,1,1,4,25,2,0,25,0)
heartData = data.frame(pulse=sampleData)
result = isPulse(heartData$pulse[-(samples-(1:delay))], heartData$pulse[-(1:delay)])

Note that the code heartData$pulse[-(samples-(1:delay))] trims delay samples from the end, for the previous stream, and heartData$pulse[-(1:delay)] trims delay samples from the start, for the current stream.

Doing it manually, the results should be (using F for false and T for true)

F,T,T,T,F,F,F,T,F

and by running it, we find that they are!:

> print(result)
FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE

success!

Since you want to bind these back as a column into your original dataset, you should note that the new array is delay elements shorter than your original data, so you need to pad it at the start with delay FALSE elements. You may also want to convert it into 0,1 as per your data:

resultPadded <- c(rep(FALSE,delay), result)
heartData$result = ifelse(resultPadded, 1, 0)

which gives

> heartData
   pulse result
1      1      0
2      0      0
3      1      1
4      1      1
5      4      1
6     25      0
7      2      0
8      0      0
9     25      1
10     0      0




相关问题
How to plot fitted model over observed time series

This is a really really simple question to which I seem to be entirely unable to get a solution. I would like to do a scatter plot of an observed time series in R, and over this I want to plot the ...

REvolution for R

since the latest Ubuntu release (karmic koala), I noticed that the internal R package advertises on start-up the REvolution package. It seems to be a library collection for high-performance matrix ...

R - capturing elements of R output into text files

I am trying to run an analysis by invoking R through the command line as follows: R --no-save < SampleProgram.R > SampleProgram.opt For example, consider the simple R program below: mydata =...

R statistical package: wrapping GOFrame objects

I m trying to generate GOFrame objects to generate a gene ontology mapping in R for unsupported organisms (see http://www.bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/...

Changing the order of dodged bars in ggplot2 barplot

I have a dataframe df.all and I m plotting it in a bar plot with ggplot2 using the code below. I d like to make it so that the order of the dodged bars is flipped. That is, so that the bars labeled "...

Strange error when using sparse matrices and glmnet

I m getting a weird error when training a glmnet regression. invalid class "dgCMatrix" object: length(Dimnames[[2]]) must match Dim[2] It only happens occasionally, and perhaps only under larger ...

Generating non-duplicate combination pairs in R

Sorry for the non-descriptive title but I don t know whether there s a word for what I m trying to achieve. Let s assume that I have a list of names of different classes like c( 1 , 2 , 3 , 4 ) ...

Per panel smoothing in ggplot2

I m plotting a group of curves, using facet in ggplot2. I d like to have a smoother applied to plots where there are enough points to smooth, but not on plots with very few points. In particular I d ...

热门标签