Question

这是一个高层次的一般性问题。周围有一些类似的问题, 有着不同的、更简洁的例子。也许无法解答。 conn 是矩阵。

     for (i in 2:dim(conn)[1]) {
        for (j in 2:dim(conn)[1]) {
          if ((conn[i, 1] == conn[1, j]) & conn[i, 1] != 0) {
              conn[i, j] <- 1
              conn[j, i] <- 1
              }
              else {
                conn[i, j] <- 0
                conn[j, i] <- 0
                }
           }
      }

这直接来自集束群集软件包 cluscomp 。

My question is simply: is it possible to speed up the loop or to vectorise it? As an R beginner, I cannot see it and don t want to end up with frustration because it may not be possible. I ll accept any answer that can say yes or no and hint towards the potential amount of effort involved.

Answer 1

假设控制室不是负和对称的...

connmake = function(conn){
  ordering = order(conn[,1])
  breakpoints = which(diff(conn[ordering,1]) != 0)
  if (conn[ordering[1], 1] != 0){
    breakpoints = c(1, breakpoints + 1, nrow(conn) + 1)
  } else {
    breakpoints = c(breakpoints + 1, nrow(conn) +1)
  }
  output = matrix(0, nrow(conn), nrow(conn))

  for (i in 1:(length(breakpoints) - 1)){
    output[ ordering[breakpoints[i]:(breakpoints[i+1] -1)],
        ordering[breakpoints[i]:(breakpoints[i+1] -1)]] =  1
  }
  output[,1] = conn[,1]
  output[1,] = conn[,1]
  output
}

(原件代码以 orig() , f2() 是先前的建议。 )



size = 2000
conn  = matrix(0, size, size)
conn[1,] = sample( 1:20, size, replace = T)
conn[,1] = conn[1,]

system.time(orig(conn) -> out1)
#user  system elapsed 
#20.54    0.00   20.54 
system.time(f2(conn) -> out2)
#user  system elapsed
#0.39    0.02    0.41 
system.time(connmake(conn) -> out3)
#user  system elapsed 
#0.02    0.00    0.01 
identical(out1, out2)
#[1] TRUE
identical(out1, out3)
#[1] TRUE


请注意, f2 包含 0 的 conn 失败, 但不是我的问题 。 与 负值 的 conn 失败, 可以简单地通过 安全 抵消 来增加 相关 的 数值 。 非 对称 的 conn 需要更多思考, 但应该可行...

一般的教训是,这种比较与对比相比速度快。 相对比较是 O( N) 2, 而 R 中最慢的排序算法是 O( N) 4 / 3 ) 。 一旦对数据进行了分类, 比较就变得微不足道了 。

Answer 2

以下是我如何写成的, 使用 < code>outer 作为双环的替代。注意它仍在做比需要更多的计算, 但肯定更快。我假设 < code> conn 是平方矩阵。

原始代码 :

f1 <- function(conn) {
   for (i in 2:dim(conn)[1]) {
      for (j in 2:dim(conn)[1]) {
         if ((conn[i, 1] == conn[1, j]) & conn[i, 1] != 0) {
            conn[i, j] <- 1
            conn[j, i] <- 1
         } else {
            conn[i, j] <- 0
            conn[j, i] <- 0
         }
      }
   }
   return(conn)
}

我的建议是:

f2 <- function(conn) {
   matches <- 1*outer(conn[-1,1], conn[1,-1], `==`)
   matches[conn[-1,1] == 0, ] <- 0
   ind <- upper.tri(matches)
   matches[ind] <- t(matches)[ind]
   conn[-1,-1] <- matches
   return(conn)
}

一些抽样数据:

set.seed(12345678)
conn <- matrix(sample(1:2, 5*5, replace=TRUE), 5, 5)
conn
#      [,1] [,2] [,3] [,4] [,5]
# [1,]    2    2    1    2    1
# [2,]    1    1    2    2    1
# [3,]    2    2    1    2    1
# [4,]    2    2    2    2    1
# [5,]    1    1    2    2    1

结果:

f1(conn)
#      [,1] [,2] [,3] [,4] [,5]
# [1,]    2    2    1    2    1
# [2,]    1    0    1    1    0
# [3,]    2    1    0    0    1
# [4,]    2    1    0    1    0
# [5,]    1    0    1    0    1

identical(f1(conn), f2(conn))
# [1] TRUE

一个较大的例子, 与时间比较 :

set.seed(12345678)
conn <- matrix(sample(1:2, 1000*1000, replace=TRUE), 1000, 1000)

system.time(a1 <- f1(conn))
# user  system elapsed 
# 59.840   0.000  57.094 

system.time(a2 <- f2(conn))
# user  system elapsed 
# 0.844   0.000   0.950 

identical(a1, a2)
# [1] TRUE

也可能不是你们能找到的最快的方法(我毫不怀疑这里的其他人使用编译器或Rcpp等方法能找到更快的方法),

编辑 : 由于已经指出(从该代码的引出背景来看) conn 是一个对称矩阵, 我的解决方案可以略微缩短 :

f2 <- function(conn) {
   matches <- outer(conn[-1,1], conn[1,-1],
                    function(i,j)ifelse(i==0, FALSE, i==j)) 
   conn[-1,-1] <- as.numeric(matches)
   return(conn)
}

Answer 3

想到了好几种事情。

首先,您可以通过在对角线下或对角线上方或上方的条目循环,将时间减半左右。如果矩阵为正方,则该矩阵要么有效。如果 dim(conn) [1] [1] > dim(conn) [2] ,那么您就会想使用类似的方式绕过下左三角形。

for (j in 2:dim(conn)[2]) {
  for (i in j:dim(conn)[1]) {
    ...
  }
}

其次,人们可能会尝试使用 apply 和它平面,因为它们通常会显著减少时间。但是,在此情况下,每个 [i,j] 单元格都指向列头 [1,j] 和行头 [i,1] ,这意味着我们不能仅仅将一个单元格、行或列发送到 *pply。为了代码清晰,我可能保留的 < code> 用于环。任何基于 *pply 的把戏,只要能用到如此聪明的技巧,我就会忘记它从现在开始是如何工作的。



最后,这似乎是一个典型的例子, 使用来自R的C会非常快, 速度要快得多。 这看起来可能比您想象的要快得多, 但即使(对于这个特定的例子)如果你不知道C。 第一个从R调调用C的简单例子, 对我来说是有道理的, 就是这里 , 但是它似乎比你想的Rcpp要容易得多,所以我也不会停下来。 或者,如果你从any  /em > 工作Rcpp代码的简单例子开始, 你可以修改它来做你在这里想要做的事。 如果你想修改别人的代码, 从开始, https://stackoverflow.com/ ques/9154383/converting-loop-frof-r-c-using-rviing-r-rview-rpp>.

友情链接