English 中文(简体)
挑战:优化取消上市[简单]
原标题:challenge: optimize unlisting [easy]

因为SO最近有点慢,所以我发布了一个简单的问题。如果大人物能在这场比赛中坐在替补席上,给新秀一个回应的机会,我将不胜感激。

有时,我们的对象有大量的大列表元素(向量)。如何将这个对象“取消列表”到单个向量中。显示您的方法比unlist()更快的证据。

最佳回答

如果你不需要名字,而且你的名单只有一级,那么如果你能打败

.Internal(unlist(your_list, FALSE, FALSE))

在接下来的一年里,我会投票支持你在SO上所做的一切!!!

[更新:如果需要非唯一名称,并且列表不是递归的,这里有一个版本,它比未列出的版本改进了100倍

 myunlist <- function(l){
    names <- names(l)
    vec <- unlist(l, F, F)
    reps <- unlist(lapply(l, length), F, F)
    names(vec) <- rep(names, reps)
    vec
    }

 myunlist(list(a=1:3, b=2))
 a a a b 
 1 2 3 2 

 > tl <- list(a = 1:20000, b = 1:5000, c = 2:30)
 > system.time(for(i in 1:200) unlist(tl))
 user  system elapsed 
 22.97    0.00   23.00 

 > system.time(for(i in 1:200) myunlist(tl))
 user  system elapsed 
 0.2     0.0     0.2 

 > system.time(for(i in 1:200) unlist(tl, F, F))
 user  system elapsed 
 0.02    0.00    0.02 

]

[更新2:响应Richie Cotton的挑战Nr3。

bigList3 <- replicate(500, rnorm(1e3), simplify = F)

unlist_vit <- function(l){
    names(l) <- NULL
    do.call(c, l)
    }

library(rbenchmark)

benchmark(unlist = unlist(bigList3, FALSE, FALSE),
          rjc    = unlist_rjc(bigList3),
          vit    = unlist_vit(bigList3),
          order  = "elapsed",
          replications = 100,
          columns = c("test", "relative", "elapsed")
          )

    test  relative elapsed
1 unlist   1.0000    2.06
3    vit   1.4369    2.96
2    rjc   3.5146    7.24

]

附言:我认为“大人物”是比你更有名气的人。所以我在这里很小:)。

问题回答

一个非unlist()的解决方案必须非常快才能击败unsist(),不是吗?在这里,不到两秒钟就可以取消列出一个包含2000个数字向量的列表,每个向量的长度为100000。

> bigList2 <- as.list(data.frame(matrix(rep(rnorm(1000000), times = 200), 
+                                       ncol = 2000)))
> print(object.size(bigList2), units = "Gb")
1.5 Gb
> system.time(foo <- unlist(bigList2, use.names = FALSE))
   user  system elapsed 
  1.897   0.000   2.019

在我的工作区中有bigList2foo,R使用了大约9Gb的可用内存。关键是use.names=FALSE。如果没有它,unlist()将非常缓慢。确切地说,我还在等着发现。。。

我们可以通过设置recursive=FALSE来加快速度,然后我们得到了与VitoshKa的答案相同的答案(两个代表性的时间):

> system.time(foo <- unlist(bigList2, recursive = FALSE, use.names = FALSE))
   user  system elapsed 
  1.379   0.001   1.416
> system.time(foo <- .Internal(unlist(bigList2, FALSE, FALSE)))
   user  system elapsed 
  1.335   0.000   1.344

…最后use.names=TRUE版本完成…:

> system.time(foo <- unlist(bigList2, use = TRUE))
    user   system  elapsed 
2307.839   10.978 2335.815

它消耗了我所有系统16Gb的RAM,所以我当时放弃了。。。

c()具有逻辑参数recursive,当设置为TRUE时,该参数将递归取消列出向量(默认值显然为FALSE)。

l <- replicate(500, rnorm(1e3), simplify = F)

microbenchmark::microbenchmark(
  unlist = unlist(l, FALSE, FALSE),
  c = c(l, recursive = TRUE, use.names = FALSE)
)

# Unit: milliseconds
# expr      min       lq     mean   median       uq      max neval
# unlist 3.083424 3.121067 4.662491 3.172401 3.985668 27.35040   100
#      c 3.084890 3.133779 4.090520 3.201246 3.920646 33.22832   100

作为一条中等大小的鱼,我首先尝试了一个解决方案,为小鱼提供了一个可以击败的基准。它比未列出的慢大约3倍。

我使用的是ucfagls的测试列表的较小版本。(因为它更适合记忆。)

bigList3 <- as.list(data.frame(matrix(rep(rnorm(1e5), times = 200), ncol = 2000)))

其基本思想是创建一个长向量来存储答案,然后在列表项上循环,从列表中复制值。

unlist_rjc <- function(l)
{
  lengths <- vapply(l, length, FUN.VALUE = numeric(1), USE.NAMES = FALSE)
  total_len <- sum(lengths)
  end_index <- cumsum(lengths)
  start_index <- 1 + c(0, end_index)
  v <- numeric(total_len)
  for(i in seq_along(l))
  {
    v[start_index[i]:end_index[i]] <- l[[i]]
  }
  v
}

t1 <- system.time(for(i in 1:10) unlist(bigList2, FALSE, FALSE))
t2 <- system.time(for(i in 1:10) unlist_rjc(bigList2))
t2["user.self"] / t1["user.self"]  # 3.08

Challenges for little fishes:
1. Can you extend it to deal with other types than numeric?
2. Can you get it to work with recursion (nested lists)?
3. Can you make it faster?

我会投票给比我得分少的人,如果他们的答案满足了一个或多个这些小挑战。





相关问题
Finding a class within list

I have a class (Node) which has a property of SubNodes which is a List of the Node class I have a list of Nodes (of which each Node may or may not have a list of SubNodes within itself) I need to be ...

How to flatten a List of different types in Scala?

I have 4 elements:List[List[Object]] (Objects are different in each element) that I want to zip so that I can have a List[List[obj1],List[obj2],List[obj3],List[obj4]] I tried to zip them and I ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

Is List<> better than DataSet for UI Layer in ASP.Net?

I want to get data from my data access layer into my business layer, then prepare it for use in my UI. So i wonder: is it better to read my data by DataReader and use it to fill a List<BLClasses&...

What is the benefit to using List<T> over IEnumerable<T>?

or the other way around? I use generic lists all the time. But I hear occasionally about IEnumerables, too, and I honestly have no clue (today) what they are for and why I should use them. So, at ...

灵活性:在滚动之前显示错误的清单

我有一份清单,在你滚动之前没有显示任何物品,然后这些物品就显示。 是否有任何人知道如何解决这一问题? 我尝试了叫人名单。

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签