因为SO最近有点慢,所以我发布了一个简单的问题。如果大人物能在这场比赛中坐在替补席上,给新秀一个回应的机会,我将不胜感激。
有时,我们的对象有大量的大列表元素(向量)。如何将这个对象“取消列表”到单个向量中。显示您的方法比unlist()
更快的证据。
因为SO最近有点慢,所以我发布了一个简单的问题。如果大人物能在这场比赛中坐在替补席上,给新秀一个回应的机会,我将不胜感激。
有时,我们的对象有大量的大列表元素(向量)。如何将这个对象“取消列表”到单个向量中。显示您的方法比unlist()
更快的证据。
如果你不需要名字,而且你的名单只有一级,那么如果你能打败
.Internal(unlist(your_list, FALSE, FALSE))
在接下来的一年里,我会投票支持你在SO上所做的一切!!!
[更新:如果需要非唯一名称,并且列表不是递归的,这里有一个版本,它比未列出的版本改进了100倍
myunlist <- function(l){
names <- names(l)
vec <- unlist(l, F, F)
reps <- unlist(lapply(l, length), F, F)
names(vec) <- rep(names, reps)
vec
}
myunlist(list(a=1:3, b=2))
a a a b
1 2 3 2
> tl <- list(a = 1:20000, b = 1:5000, c = 2:30)
> system.time(for(i in 1:200) unlist(tl))
user system elapsed
22.97 0.00 23.00
> system.time(for(i in 1:200) myunlist(tl))
user system elapsed
0.2 0.0 0.2
> system.time(for(i in 1:200) unlist(tl, F, F))
user system elapsed
0.02 0.00 0.02
]
[更新2:响应Richie Cotton的挑战Nr3。
bigList3 <- replicate(500, rnorm(1e3), simplify = F)
unlist_vit <- function(l){
names(l) <- NULL
do.call(c, l)
}
library(rbenchmark)
benchmark(unlist = unlist(bigList3, FALSE, FALSE),
rjc = unlist_rjc(bigList3),
vit = unlist_vit(bigList3),
order = "elapsed",
replications = 100,
columns = c("test", "relative", "elapsed")
)
test relative elapsed
1 unlist 1.0000 2.06
3 vit 1.4369 2.96
2 rjc 3.5146 7.24
]
附言:我认为“大人物”是比你更有名气的人。所以我在这里很小:)。
一个非unlist()
的解决方案必须非常快才能击败unsist()
,不是吗?在这里,不到两秒钟就可以取消列出一个包含2000个数字向量的列表,每个向量的长度为100000。
> bigList2 <- as.list(data.frame(matrix(rep(rnorm(1000000), times = 200),
+ ncol = 2000)))
> print(object.size(bigList2), units = "Gb")
1.5 Gb
> system.time(foo <- unlist(bigList2, use.names = FALSE))
user system elapsed
1.897 0.000 2.019
在我的工作区中有bigList2
和foo
,R使用了大约9Gb的可用内存。关键是use.names=FALSE
。如果没有它,unlist()
将非常缓慢。确切地说,我还在等着发现。。。
我们可以通过设置recursive=FALSE
来加快速度,然后我们得到了与VitoshKa的答案相同的答案(两个代表性的时间):
> system.time(foo <- unlist(bigList2, recursive = FALSE, use.names = FALSE))
user system elapsed
1.379 0.001 1.416
> system.time(foo <- .Internal(unlist(bigList2, FALSE, FALSE)))
user system elapsed
1.335 0.000 1.344
…最后use.names=TRUE
版本完成…:
> system.time(foo <- unlist(bigList2, use = TRUE))
user system elapsed
2307.839 10.978 2335.815
它消耗了我所有系统16Gb的RAM,所以我当时放弃了。。。
c()
具有逻辑参数recursive
,当设置为TRUE
时,该参数将递归取消列出向量(默认值显然为FALSE
)。
l <- replicate(500, rnorm(1e3), simplify = F)
microbenchmark::microbenchmark(
unlist = unlist(l, FALSE, FALSE),
c = c(l, recursive = TRUE, use.names = FALSE)
)
# Unit: milliseconds
# expr min lq mean median uq max neval
# unlist 3.083424 3.121067 4.662491 3.172401 3.985668 27.35040 100
# c 3.084890 3.133779 4.090520 3.201246 3.920646 33.22832 100
作为一条中等大小的鱼,我首先尝试了一个解决方案,为小鱼提供了一个可以击败的基准。它比未列出的慢大约3倍。
我使用的是ucfagls
的测试列表的较小版本。(因为它更适合记忆。)
bigList3 <- as.list(data.frame(matrix(rep(rnorm(1e5), times = 200), ncol = 2000)))
其基本思想是创建一个长向量来存储答案,然后在列表项上循环,从列表中复制值。
unlist_rjc <- function(l)
{
lengths <- vapply(l, length, FUN.VALUE = numeric(1), USE.NAMES = FALSE)
total_len <- sum(lengths)
end_index <- cumsum(lengths)
start_index <- 1 + c(0, end_index)
v <- numeric(total_len)
for(i in seq_along(l))
{
v[start_index[i]:end_index[i]] <- l[[i]]
}
v
}
t1 <- system.time(for(i in 1:10) unlist(bigList2, FALSE, FALSE))
t2 <- system.time(for(i in 1:10) unlist_rjc(bigList2))
t2["user.self"] / t1["user.self"] # 3.08
Challenges for little fishes:
1. Can you extend it to deal with other types than numeric?
2. Can you get it to work with recursion (nested lists)?
3. Can you make it faster?
我会投票给比我得分少的人,如果他们的答案满足了一个或多个这些小挑战。
I have a class (Node) which has a property of SubNodes which is a List of the Node class I have a list of Nodes (of which each Node may or may not have a list of SubNodes within itself) I need to be ...
I have 4 elements:List[List[Object]] (Objects are different in each element) that I want to zip so that I can have a List[List[obj1],List[obj2],List[obj3],List[obj4]] I tried to zip them and I ...
I have been searching for sample code creating iterator for my own container, but I haven t really found a good example. I know this been asked before (Creating my own Iterators) but didn t see any ...
Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...
I want to get data from my data access layer into my business layer, then prepare it for use in my UI. So i wonder: is it better to read my data by DataReader and use it to fill a List<BLClasses&...
or the other way around? I use generic lists all the time. But I hear occasionally about IEnumerables, too, and I honestly have no clue (today) what they are for and why I should use them. So, at ...
我有一份清单,在你滚动之前没有显示任何物品,然后这些物品就显示。 是否有任何人知道如何解决这一问题? 我尝试了叫人名单。
I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...