English 中文(简体)
为什么是p.hypot和Np.subtract。 与面包车广播相比,外部飞行速度非常快,计算距离矩阵的方法是否比较快?
原标题:Why are np.hypot and np.subtract.outer very fast compared to vanilla broadcast and are there faster ways to calculate distance matrix?

我有两套2D点,需要计算距离矩阵。 我需要尽快使用NumPy广播。 在计算距离矩阵的两种方法中,我不理解为什么一个比另一个好。

https://github.com/numpy/numpy/issues/14761“rel=“nofollow noreferer”> 我的结果自相矛盾。 计算距离矩阵的单位[3、4、6]和[8、9],但3+4使用<代码>代号<>>>>。 使用广播的频率超过8个,使用<代码>hypot的频率高于9个,这是简单的方式。 我没有尝试过斯堪 Python,假定它永远不会结束。

  1. Is there a faster way to calculate distance matrix?
  2. Why hypot and subtract.outer are faster?

法典(改变种子以防止切除):

### Cell 1
import numpy as np

np.random.seed(858442)

### Cell 2
%%time
obs = np.random.random((50000, 2))
interp = np.random.random((30000, 2))

CPU times: user 2.02 ms, sys: 1.4 ms, total: 3.42 ms
Wall time: 1.84 ms

### Cell 3
%%time
d0 = np.subtract.outer(obs[:,0], interp[:,0])

CPU times: user 2.46 s, sys: 1.97 s, total: 4.42 s
Wall time: 4.42 s

### Cell 4
%%time
d1 = np.subtract.outer(obs[:,1], interp[:,1])

CPU times: user 3.1 s, sys: 2.7 s, total: 5.8 s
Wall time: 8.34 s

### Cell 5
%%time
h = np.hypot(d0, d1)

CPU times: user 12.7 s, sys: 24.6 s, total: 37.3 s
Wall time: 1min 6s

### Cell 6
np.random.seed(773228)

### Cell 7
%%time
obs = np.random.random((50000, 2))
interp = np.random.random((30000, 2))

CPU times: user 1.84 ms, sys: 1.56 ms, total: 3.4 ms
Wall time: 2.03 ms

### Cell 8
%%time
d = obs[:, np.newaxis, :] - interp
d0, d1 = d[:, :, 0], d[:, :, 1]

CPU times: user 22.7 s, sys: 8.24 s, total: 30.9 s
Wall time: 33.2 s

### Cell 9
%%time
h = np.sqrt(d0**2 + d1**2)

CPU times: user 29.1 s, sys: 2min 12s, total: 2min 41s
Wall time: 6min 10s
最佳回答

首先,<代码>d0和d1 每一条带<代码>50000×30000×8 = 12 GB,其数量相当大。 确保你们有100多英镑的记忆,因为这是整个文字要求的东西! 页: 1 如果你没有足够记忆,操作系统将使用storage Tool(例如swap)储存过剩数据,这种数据要慢得多。 实际上,没有理由的单元4比单元3慢,我猜测,你已经没有足够的记忆(完全)存储<代码>d1<>代码/代码”,而<代码>d0似乎适合(主要是)记忆。 当两台机器都能够安装在援助团时,我的机器不会有差异(也能够改变业务秩序以检查这一情况)。 这也解释了为什么进一步的行动趋于缓慢。

尽管如此,单元8+9的制作速度也比较慢,因为它们制造了临时阵列,并且需要更多的记忆片来计算结果,而不是单元3+4+5。 事实上,np.sqrt(d0**2 + d1**2) First compute d0**2,用于记忆中形成新的12个GB临时阵列,然后计算d1** 2,导致又出现12个GB临时阵列,然后履行两个临时阵列的总和,以生产另外12个GB临时阵列,最后计算出另外12个GB临时阵列。 这可能需要多达48英镑的记忆,需要4张照相机。 这种做法效率不高,没有有效地使用《公约》/《公约》(如《公约》)。

更快的执行,包括使用1张通行证进行全方位计算,同时使用。 Numba s JIT。 例如:

import numba as nb
@nb.njit(parallel=True)
def distanceMatrix(a, b):
    res = np.empty((a.shape[0], b.shape[0]), dtype=a.dtype)
    for i in nb.prange(a.shape[0]):
        for j in range(b.shape[0]):
            res[i, j] = np.sqrt((a[i, 0] - b[j, 0])**2 + (a[i, 1] - b[j, 1])**2)
    return res

这一执行使用3倍于减去记忆<>/strong>(只有12 GB),并且比使用<条码>代号<>>的代号>更快。 事实上,由于鞭打,单元3+4+5的几分钟时间,而次为1.3秒!

takeaway是,记忆存取既昂贵又是临时性的。 在进行计算时,需要避免使用多张通行证,同时进行巨大的缓冲,并利用CPU的藏匿处(例如使用阵列丘)。

问题回答

Update thanks to Jérôme Richard here

  • Stackoverflow never disappoints
  • There is a faster way using numba
  • It has just in time compiler which will convert python snippet to fast machine code, the first time you use it will be little slower than subsequent use since it compiles. But even for first time njit parallel beats hypot + subtract.outer by 9x margin for (49000, 12000) matrix

Performance of various methods

  • make sure to use different seed each time running script
import sys
import time

import numba as nb
import numpy as np

np.random.seed(int(sys.argv[1]))

d0 = np.random.random((49000, 2))
d1 = np.random.random((12000, 2))

def f1(d0, d1):
    print( Numba without parallel )
    res = np.empty((d0.shape[0], d1.shape[0]), dtype=d0.dtype)
    for i in nb.prange(d0.shape[0]):
        for j in range(d1.shape[0]):
            res[i, j] = np.sqrt((d0[i, 0] - d1[j, 0])**2 + (d0[i, 1] - d1[j, 1])**2)
    return res

# Add eager compilation, compiles before hand
@nb.njit((nb.float64[:, :], nb.float64[:, :]), parallel=True)
def f2(d0, d1):
    print( Numba with parallel )
    res = np.empty((d0.shape[0], d1.shape[0]), dtype=d0.dtype)
    for i in nb.prange(d0.shape[0]):
        for j in range(d1.shape[0]):
            res[i, j] = np.sqrt((d0[i, 0] - d1[j, 0])**2 + (d0[i, 1] - d1[j, 1])**2)
    return res

def f3(d0, d1):
    print( hypot + subtract.outer )
    np.hypot(
        np.subtract.outer(d0[:,0], d1[:,0]),
        np.subtract.outer(d0[:,1], d1[:,1])
    )

if __name__ ==  __main__ :
    s1 = time.time()
    eval(f {sys.argv[2]}(d0, d1) )
    print(time.time() - s1)
(base) ~/xx@xx:~/xx$ python3 test.py 523432 f3
hypot + subtract.outer
9.79756784439087
(base) xx@xx:~/xx$ python3 test.py 213622 f2
Numba with parallel
0.3393140316009521

I will update this post for further developments and if I found even faster method





相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...