English 中文(简体)
log1p n
原标题:Unexpected behaviour of log1p numpy
  • 时间:2024-04-12 18:48:30
  •  标签:
  • python
  • numpy

I am using the function numpy.log1p to calculate the value of log(1 + x) for very small complex numbers, and I am getting unexpected results.

我预计,产出实际上应当等同于职能投入。 下面的简单例子似乎并非如此。

np.log1p(1e-14 * (1 + 1j))
Out[75]: (9.992007221626358e-15+9.9999999999999e-15j)
np.log1p(1e-15 * (1 + 1j))
Out[76]: (1.110223024625156e-15+9.999999999999989e-16j)
np.log1p(1e-16 * (1 + 1j))
Out[77]: 1e-16j

标志1p功能来自ip。 特殊的工作似乎正确,但不幸的是,我需要利用 n(umba)的功能。

I am currently using numpy version 1.26.4 on Python 3.10.10

np.__version__
Out[78]:  1.26.4 
最佳回答
问题回答

I would suggest using either a three or four term Taylor series here. You re using Numba, so it s pretty straightforward to get something that s both accurate and perfomant.

在这方面,你可以执行:

import numba as nb


@nb.vectorize(fastmath=True)
def log1p_acc(x):
    if np.abs(x) >= 1e-3:
        # np.log1p is accurate for large values of x
        return np.log1p(x)
    else:
        terms = 4
        x_pow = x
        sign = 1
        sum_ = 0
        for i in range(1, terms + 1):
            # Note: use * (1 / i) to allow optimizer to avoid fdiv
            sum_ += sign * x_pow * (1 / i)
            sign *= -1
            x_pow *= x
        return sum_

为改变术语数目,您可修改<代码>terms变量。

Accuracy

让我们衡量这一功能的准确程度。

首先,为了量化我们更换的准确性,我们需要三个方面:参考执行、测试数据集和错误衡量标准。

我选择了<代码>mpmath.log1p 执行作为参考,并将准确性确定为1,000 位数。

其次,我们需要问一下我们所关心的全方位准确。 我假定,你只关心少量人员,因此我从1e-30到1年的标志分布中抽取样本。 这基本上假定,你所关心的范围从1e-30到1e-29不等,就像你所关心的0.1到1。 这里的一个例子是,在按直线和标志尺度排列时,记录的统一分布与否。

在4起案件中,有1起是同数倍的:

  1. Real and complex are independent log uniform samples.
  2. Real and complex are the same log uniform sample.
  3. Real is log uniform, and complex is zero.
  4. Real is zero, and complex is log uniform.

这是我评估业绩的测试。

第三,我选择了一个错误衡量标准。 您通过一些衡量标准得出以下结果:np.log1p。 斜.:与大约10<>-17的真正结果不同,即机器ep。 相对而言,它很坏,大约为0.07%。

因此,我认为你对相对错误更感兴趣。 我使用以下公式来衡量这一点:error = abs(true - pred)/abs(true),其中abs(<>是复杂的绝对值。

然后,我比较了我的职能,即SciPy slog1p和NumPy slog1p。 下面的图表显示了投入的相对错误与绝对价值,用于计算这一错误的三种不同方法。

“error

关于这一阴谋,有一些值得注意的事情:

  • For some inputs, NumPy has a relative error of nearly 100%. For example, the code np.log1p((1e-16+1e-30j)) returns a value with zero as the real term.
  • SciPy is much more accurate than NumPy - most inputs are accurate to about 1e-16.
  • From x=1e-30 to 1e-4, the custom method has error of less than 1e-16.
  • At x=1e-4, the error of the custom method rises above SciPy. If it did not switch back to NumPy at x=1e-3, the error would keep rising, especially as you move outside the interval of convergence.

这里是用来生产这一土地的法典:

import numba as nb
import numpy as np
import scipy.special as sc
import scipy
import matplotlib.pyplot as plt
import mpmath
mpmath.mp.dps = 1000

plt.rcParams["figure.figsize"] = (12,8)



def generate_log_distribution(N):
    """Generate small numbers spanning many orders of magnitude"""
#     return 10 ** np.random.uniform(-3, -30, size=N)
    return scipy.stats.loguniform.rvs(1e-30, 1e-0, size=N)

test_set_size = 100000 // 4
test_set1 = generate_log_distribution(test_set_size) + 1j * generate_log_distribution(test_set_size)
identical_real_complex = generate_log_distribution(test_set_size)
test_set2 = identical_real_complex + 1j * identical_real_complex
test_set3 = generate_log_distribution(test_set_size) + 0j
test_set4 = 1j * generate_log_distribution(test_set_size) + 0
test_set = np.concatenate([test_set1, test_set2, test_set3, test_set4])
test_set_ref = [mpmath.log1p(c) for c in test_set]


@nb.vectorize(fastmath=True)
def log1p_acc(x):
    if np.abs(x) >= 1e-3:
        # np.log1p is accurate for large values of x
        return np.log1p(x)
    else:
        terms = 4
        x_pow = x
        sign = 1
        sum_ = 0
        for i in range(1, terms + 1):
            # Note: use (1 / i) to allow optimizer to avoid fdiv
            sum_ += sign * x_pow * (1 / i)
            sign *= -1
            x_pow *= x
        return sum_


functions = [
    ("NumPy log1p", np.log1p),
    ("SciPy log1p", sc.log1p),
    ("Custom log1p", log1p_acc),
]
for func_name, func in functions:
    x = []
    y = []
    for i in range(len(test_set)):
        number = test_set[i]
        logged = func(number)
        # Do error calculation in arbitrary precision
        logged = mpmath.mpc(logged)
        out = logged - test_set_ref[i]
        out_abs = mpmath.fabs(out)
        number_abs = mpmath.fabs(number)
        relative_err = out_abs / mpmath.fabs(test_set_ref[i])
        x.append(number_abs)
        y.append(relative_err)
    print(f"Report for {func_name}")
    print(f"Average relative error: {float(np.mean(y))}")
#     print("Worst", np.max(y), "for", test_set[np.argmax(y)])
    print()
    plt.scatter(x, y, label=func_name, s=1, alpha=0.1)
    plt.xscale( log )
    plt.yscale( log )
    plt.xlabel( Norm of x )
    plt.ylabel( Relative error in log1p(x) )
    leg = plt.legend()
    for lh in leg.legend_handles:
        lh.set_alpha(1)
        lh.set_sizes([20])

<>1> 请注意,将<代码>dps至1000,仅指以1,000位精确数字进行中间计算。 这并不一定意味着最终结果准确到千位数。 可能只有一半准确。 我没有核对数字稳定<代码>mpmath.log1p()。

Performance

我比较了这一功能相对于SciPy和NumPy版本的表现,使用了整个试验组的标志1p。 它比NumPy快2x,比SciPy快3x。 大约需要14个纳米秒评价。

法典:

print("NumPy log1p")
%timeit np.log1p(test_set)
print("SciPy log1p")
%timeit sc.log1p(test_set)
print("Custom log1p")
%timeit log1p_acc(test_set)

产出:

NumPy log1p
2.98 ms ± 63.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
SciPy log1p
3.69 ms ± 5.58 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Custom log1p
1.42 ms ± 2.74 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

我运用了以下想法来加快实现这一点:

  • I began with the taylor series, and turned it into a loop. Instead of using exponentiation to calculate each term, I multiplied the previous term by x.
  • Floating point division is usually slightly slower than floating point multiplication. For that reason, I rewrote sign * x_pow / i as sign * x_pow * (1 / i)
  • As terms is a constant, Numba is smart enough to fully unroll this loop, which makes this as fast as writing the Taylor series out explicitly.
  • I used fastmath, which allows Numba to re-arrange the order it does math in.
  • I found that having a sign variable was slightly faster than selecting either -= or += with the loop index.




相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签