English 中文(简体)
R: 是否有可能使这种双环的矢量化/加速?
原标题:R: Is it possible to vectorise / speed-up this double loop?

这是一个高层次的一般性问题。 周围有一些类似的问题, 有着不同的、 更简洁的例子。 也许无法解答 。 conn 是矩阵 。

     for (i in 2:dim(conn)[1]) {
        for (j in 2:dim(conn)[1]) {
          if ((conn[i, 1] == conn[1, j]) & conn[i, 1] != 0) {
              conn[i, j] <- 1
              conn[j, i] <- 1
              }
              else {
                conn[i, j] <- 0
                conn[j, i] <- 0
                }
           }
      }

这直接来自集束群集软件包 cluscomp

My question is simply: is it possible to speed up the loop or to vectorise it? As an R beginner, I cannot see it and don t want to end up with frustration because it may not be possible. I ll accept any answer that can say yes or no and hint towards the potential amount of effort involved.

最佳回答

假设控制室不是负和对称的...

connmake = function(conn){
  ordering = order(conn[,1])
  breakpoints = which(diff(conn[ordering,1]) != 0)
  if (conn[ordering[1], 1] != 0){
    breakpoints = c(1, breakpoints + 1, nrow(conn) + 1)
  } else {
    breakpoints = c(breakpoints + 1, nrow(conn) +1)
  }
  output = matrix(0, nrow(conn), nrow(conn))

  for (i in 1:(length(breakpoints) - 1)){
    output[ ordering[breakpoints[i]:(breakpoints[i+1] -1)],
        ordering[breakpoints[i]:(breakpoints[i+1] -1)]] =  1
  }
  output[,1] = conn[,1]
  output[1,] = conn[,1]
  output
}

(原件代码以 orig() , f2() 是先前的建议。 )

size = 2000
conn  = matrix(0, size, size)
conn[1,] = sample( 1:20, size, replace = T)
conn[,1] = conn[1,]

system.time(orig(conn) -> out1)
#user  system elapsed 
#20.54    0.00   20.54 
system.time(f2(conn) -> out2)
#user  system elapsed
#0.39    0.02    0.41 
system.time(connmake(conn) -> out3)
#user  system elapsed 
#0.02    0.00    0.01 
identical(out1, out2)
#[1] TRUE
identical(out1, out3)
#[1] TRUE

请注意, f2 包含 0 的 conn 失败, 但不是我的问题 。 与 负值 的 conn 失败, 可以简单地通过 安全 抵消 来增加 相关 的 数值 。 非 对称 的 conn 需要更多思考, 但应该可行...

一般的教训是,这种比较与对比相比速度快。 相对比较是 O( N) 2, 而 R 中最慢的排序算法是 O( N) 4 / 3 ) 。 一旦对数据进行了分类, 比较就变得微不足道了 。

问题回答

以下是我如何写成的, 使用 < code>outer 作为双环的替代。 注意它仍在做比需要更多的计算, 但肯定更快 。 我假设 < code> conn 是平方矩阵 。

原始代码 :

f1 <- function(conn) {
   for (i in 2:dim(conn)[1]) {
      for (j in 2:dim(conn)[1]) {
         if ((conn[i, 1] == conn[1, j]) & conn[i, 1] != 0) {
            conn[i, j] <- 1
            conn[j, i] <- 1
         } else {
            conn[i, j] <- 0
            conn[j, i] <- 0
         }
      }
   }
   return(conn)
}

我的建议是:

f2 <- function(conn) {
   matches <- 1*outer(conn[-1,1], conn[1,-1], `==`)
   matches[conn[-1,1] == 0, ] <- 0
   ind <- upper.tri(matches)
   matches[ind] <- t(matches)[ind]
   conn[-1,-1] <- matches
   return(conn)
}

一些抽样数据:

set.seed(12345678)
conn <- matrix(sample(1:2, 5*5, replace=TRUE), 5, 5)
conn
#      [,1] [,2] [,3] [,4] [,5]
# [1,]    2    2    1    2    1
# [2,]    1    1    2    2    1
# [3,]    2    2    1    2    1
# [4,]    2    2    2    2    1
# [5,]    1    1    2    2    1

结果:

f1(conn)
#      [,1] [,2] [,3] [,4] [,5]
# [1,]    2    2    1    2    1
# [2,]    1    0    1    1    0
# [3,]    2    1    0    0    1
# [4,]    2    1    0    1    0
# [5,]    1    0    1    0    1

identical(f1(conn), f2(conn))
# [1] TRUE

一个较大的例子, 与时间比较 :

set.seed(12345678)
conn <- matrix(sample(1:2, 1000*1000, replace=TRUE), 1000, 1000)

system.time(a1 <- f1(conn))
# user  system elapsed 
# 59.840   0.000  57.094 

system.time(a2 <- f2(conn))
# user  system elapsed 
# 0.844   0.000   0.950 

identical(a1, a2)
# [1] TRUE

也可能不是你们能找到的最快的方法(我毫不怀疑这里的其他人使用编译器或Rcpp等方法能找到更快的方法),


编辑 : 由于已经指出(从该代码的引出背景来看) conn 是一个对称矩阵, 我的解决方案可以略微缩短 :

f2 <- function(conn) {
   matches <- outer(conn[-1,1], conn[1,-1],
                    function(i,j)ifelse(i==0, FALSE, i==j)) 
   conn[-1,-1] <- as.numeric(matches)
   return(conn)
}

想到了好几种事情。

首先,您可以通过在对角线下或对角线上方或上方的条目循环,将时间减半左右。如果矩阵为正方,则该矩阵要么有效。如果 dim(conn) [1] [1] > dim(conn) [2] ,那么您就会想使用类似的方式绕过下左三角形。

for (j in 2:dim(conn)[2]) {
  for (i in j:dim(conn)[1]) {
    ...
  }
}

其次,人们可能会尝试使用 apply 和它平面,因为它们通常会显著减少时间。 但是,在此情况下,每个 [i,j] 单元格都指向列头 [1,j] 和行头 [i,1] ,这意味着我们不能仅仅将一个单元格、行或列发送到 *pply。为了代码清晰,我可能保留 的 < code> 用于 环。任何基于 *pply 的把戏,只要能用到如此聪明的技巧,我就会忘记它从现在开始是如何工作的。

最后,这似乎是一个典型的例子, 使用来自R的C会非常快, 速度要快得多。 这看起来可能比您想象的要快得多, 但即使(对于这个特定的例子)如果你不知道C。 第一个从R调调用C的简单例子, 对我来说是有道理的, 就是这里 , 但是它似乎比你想的Rcpp要容易得多,所以我也不会停下来。 或者,如果你从any /em > 工作Rcpp代码的简单例子开始, 你可以修改它来做你在这里想要做的事。 如果你想修改别人的代码, 从开始, https://stackoverflow.com/ ques/9154383/converting-loop-frof-r-c-using-rviing-r-rview-rpp>.





相关问题
How to plot fitted model over observed time series

This is a really really simple question to which I seem to be entirely unable to get a solution. I would like to do a scatter plot of an observed time series in R, and over this I want to plot the ...

REvolution for R

since the latest Ubuntu release (karmic koala), I noticed that the internal R package advertises on start-up the REvolution package. It seems to be a library collection for high-performance matrix ...

R - capturing elements of R output into text files

I am trying to run an analysis by invoking R through the command line as follows: R --no-save < SampleProgram.R > SampleProgram.opt For example, consider the simple R program below: mydata =...

R statistical package: wrapping GOFrame objects

I m trying to generate GOFrame objects to generate a gene ontology mapping in R for unsupported organisms (see http://www.bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/...

Changing the order of dodged bars in ggplot2 barplot

I have a dataframe df.all and I m plotting it in a bar plot with ggplot2 using the code below. I d like to make it so that the order of the dodged bars is flipped. That is, so that the bars labeled "...

Strange error when using sparse matrices and glmnet

I m getting a weird error when training a glmnet regression. invalid class "dgCMatrix" object: length(Dimnames[[2]]) must match Dim[2] It only happens occasionally, and perhaps only under larger ...

Generating non-duplicate combination pairs in R

Sorry for the non-descriptive title but I don t know whether there s a word for what I m trying to achieve. Let s assume that I have a list of names of different classes like c( 1 , 2 , 3 , 4 ) ...

Per panel smoothing in ggplot2

I m plotting a group of curves, using facet in ggplot2. I d like to have a smoother applied to plots where there are enough points to smooth, but not on plots with very few points. In particular I d ...