Question

我在 lo体内生成了一个单体积阵列清单,随后将这份清单转换成2个单体。如果我事先知道项目的数量,我将预先分配2个 n的阵列,但我没有这样做,因此我把所有事项列入一个清单。

mo如下:

>>> list_of_arrays = map(lambda x: x*ones(2), range(5))
>>> list_of_arrays
[array([ 0.,  0.]), array([ 1.,  1.]), array([ 2.,  2.]), array([ 3.,  3.]), array([ 4.,  4.])]
>>> arr = array(list_of_arrays)
>>> arr
array([[ 0.,  0.],
       [ 1.,  1.],
       [ 2.,  2.],
       [ 3.,  3.],
       [ 4.,  4.]])

My question is the following:

完成收集相应数字数据的任务(在我情况下是一阵列)是否比把这些数据列入清单,然后做一个假设,有更好的办法(表现)。它的阵列(我正在制造新的 ob和复制数据)? 在经过良好测试的模块中是否有“可扩展”矩阵数据结构?

我2d矩阵的典型尺寸为100x10和5000x10浮动。

EDIT: 在这个例子中,我使用地图,但在我的实际应用中,我有 lo。

Answer 1

Suppose you know that the final array arr will never be larger than 5000x10. Then you could pre-allocate an array of maximum size, populate it with data as you go through the loop, and then use arr.resize to cut it down to the discovered size after exiting the loop.

The tests below suggest doing so will be slightly faster than constructing intermediate python lists no matter what the ultimate size of the array is.

另外,arr.resize 减去未使用的记忆,因此,最后的(但可能不是中间的)记忆足迹小于<代码>python_lists_to_array。

见numpy_all_-way。更快:

% python -mtimeit -s"import test" "test.numpy_all_the_way(100)"
100 loops, best of 3: 1.78 msec per loop
% python -mtimeit -s"import test" "test.numpy_all_the_way(1000)"
100 loops, best of 3: 18.1 msec per loop
% python -mtimeit -s"import test" "test.numpy_all_the_way(5000)"
10 loops, best of 3: 90.4 msec per loop

% python -mtimeit -s"import test" "test.python_lists_to_array(100)"
1000 loops, best of 3: 1.97 msec per loop
% python -mtimeit -s"import test" "test.python_lists_to_array(1000)"
10 loops, best of 3: 20.3 msec per loop
% python -mtimeit -s"import test" "test.python_lists_to_array(5000)"
10 loops, best of 3: 101 msec per loop

见numpy_all_-way。使用较少记忆:

% test.py
Initial memory usage: 19788
After python_lists_to_array: 20976
After numpy_all_the_way: 20348

测试:

import numpy as np
import os


def memory_usage():
    pid = os.getpid()
    return next(line for line in open( /proc/%s/status  % pid).read().splitlines()
                if line.startswith( VmSize )).split()[-2]

N, M = 5000, 10


def python_lists_to_array(k):
    list_of_arrays = list(map(lambda x: x * np.ones(M), range(k)))
    arr = np.array(list_of_arrays)
    return arr


def numpy_all_the_way(k):
    arr = np.empty((N, M))
    for x in range(k):
        arr[x] = x * np.ones(M)
    arr.resize((k, M))
    return arr

if __name__ ==  __main__ :
    print( Initial memory usage: %s  % memory_usage())
    arr = python_lists_to_array(5000)
    print( After python_lists_to_array: %s  % memory_usage())
    arr = numpy_all_the_way(5000)
    print( After numpy_all_the_way: %s  % memory_usage())

Answer 2

召集方式,使用numpy.concatenate。我认为,这比“@unutbu”的答复更快:

In [32]: import numpy as np 

In [33]: list_of_arrays = list(map(lambda x: x * np.ones(2), range(5)))

In [34]: list_of_arrays
Out[34]: 
[array([ 0.,  0.]),
 array([ 1.,  1.]),
 array([ 2.,  2.]),
 array([ 3.,  3.]),
 array([ 4.,  4.])]

In [37]: shape = list(list_of_arrays[0].shape)

In [38]: shape
Out[38]: [2]

In [39]: shape[:0] = [len(list_of_arrays)]

In [40]: shape
Out[40]: [5, 2]

In [41]: arr = np.concatenate(list_of_arrays).reshape(shape)

In [42]: arr
Out[42]: 
array([[ 0.,  0.],
       [ 1.,  1.],
       [ 2.,  2.],
       [ 3.,  3.],
       [ 4.,  4.]])

Answer 3

甚至比@Gill Bates的答复简单,这里是一条行文法:

np.stack(list_of_arrays, axis=0)

Answer 4

你正在做的是标准方法。堆积层的财产是,它们需要毗连的记忆。我认为,只有<代码>strides/code> member of PyArrayObject的“洞”的可能性,但这不影响这里的讨论。假设阵列有毗连的记忆,并且是“预留”,增加了新的浏览/栏,用于分配新的记忆、复制数据,然后释放旧的记忆。如果你做很多工作,那就没有效率。

一个人可能不想编制清单,然后将其转换成一个绝热阵列,最终就是名单包含大量数字:一大批人所占用的空间远远少于本土的甲型号清单(因为当地人名单是Adhurc)。对于你典型的阵容规模,我认为这不是一个问题。

当你从一系列阵列清单中确定最后阵列时,请将全部数据复制到新阵列(如你的例子)的新地点。这仍然比拥有一个绝食阵列和做next = numpy.vstack (next, new_row)更有成效,每次都得到新的数据。 <代码>vstack(>将复制每一“row”的所有数据。

有一个,在一段时间前,在Numpy-discussion senting list 上讨论是否有可能增加一个能够有效推广/适用的新绝热阵列。当时似乎对这一点有重大兴趣,尽管我不知道究竟是哪一部分。你们或许希望看看一下这一read。

我要说的是,如果你真的需要其他东西(提高空间效率,或许会是吗?),那么你又做些什么是很 Python的,也是有效的。这就是在我不知道一开始阵列中的内容数目时,我如何创造我的冷静阵列。

Answer 5

我要补充一下我自己的“~”回答。就像一味一路一样,但如果你有指数错误,你就会生动地转售。我认为,小套数据本来会更快,但速度要小一些——检查的束缚太慢。

initial_guess = 1000

def my_numpy_all_the_way(k):
    arr=np.empty((initial_guess,M))
    for x,row in enumerate(make_test_data(k)):
        try:
            arr[x]=row
        except IndexError:
            arr.resize((arr.shape[0]*2, arr.shape[1]))
            arr[x]=row
    arr.resize((k,M))
    return arr

Answer 6

更简单的@fnjn

np.vstack(list_of_arrays)

友情链接