I would suggest using either a three or four term Taylor series here. You re using Numba, so it s pretty straightforward to get something that s both accurate and perfomant.
在这方面,你可以执行:
import numba as nb
@nb.vectorize(fastmath=True)
def log1p_acc(x):
if np.abs(x) >= 1e-3:
# np.log1p is accurate for large values of x
return np.log1p(x)
else:
terms = 4
x_pow = x
sign = 1
sum_ = 0
for i in range(1, terms + 1):
# Note: use * (1 / i) to allow optimizer to avoid fdiv
sum_ += sign * x_pow * (1 / i)
sign *= -1
x_pow *= x
return sum_
为改变术语数目,您可修改<代码>terms变量。
Accuracy
让我们衡量这一功能的准确程度。
首先,为了量化我们更换的准确性,我们需要三个方面:参考执行、测试数据集和错误衡量标准。
我选择了<代码>mpmath.log1p 执行作为参考,并将准确性确定为1,000 位数。
其次,我们需要问一下我们所关心的全方位准确。 我假定,你只关心少量人员,因此我从1e-30到1年的标志分布中抽取样本。 这基本上假定,你所关心的范围从1e-30到1e-29不等,就像你所关心的0.1到1。 这里的一个例子是,在按直线和标志尺度排列时,记录的统一分布与否。
在4起案件中,有1起是同数倍的:
- Real and complex are independent log uniform samples.
- Real and complex are the same log uniform sample.
- Real is log uniform, and complex is zero.
- Real is zero, and complex is log uniform.
这是我评估业绩的测试。
第三,我选择了一个错误衡量标准。 您通过一些衡量标准得出以下结果:np.log1p
。 斜.:与大约10<>-17的真正结果不同,即机器ep。 相对而言,它很坏,大约为0.07%。
因此,我认为你对相对错误更感兴趣。 我使用以下公式来衡量这一点:error = abs(true - pred)/abs(true)
,其中abs(<>
是复杂的绝对值。
然后,我比较了我的职能,即SciPy slog1p和NumPy slog1p。 下面的图表显示了投入的相对错误与绝对价值,用于计算这一错误的三种不同方法。
关于这一阴谋,有一些值得注意的事情:
- For some inputs, NumPy has a relative error of nearly 100%. For example, the code
np.log1p((1e-16+1e-30j))
returns a value with zero as the real term.
- SciPy is much more accurate than NumPy - most inputs are accurate to about 1e-16.
- From x=1e-30 to 1e-4, the custom method has error of less than 1e-16.
- At x=1e-4, the error of the custom method rises above SciPy. If it did not switch back to NumPy at x=1e-3, the error would keep rising, especially as you move outside the interval of convergence.
这里是用来生产这一土地的法典:
import numba as nb
import numpy as np
import scipy.special as sc
import scipy
import matplotlib.pyplot as plt
import mpmath
mpmath.mp.dps = 1000
plt.rcParams["figure.figsize"] = (12,8)
def generate_log_distribution(N):
"""Generate small numbers spanning many orders of magnitude"""
# return 10 ** np.random.uniform(-3, -30, size=N)
return scipy.stats.loguniform.rvs(1e-30, 1e-0, size=N)
test_set_size = 100000 // 4
test_set1 = generate_log_distribution(test_set_size) + 1j * generate_log_distribution(test_set_size)
identical_real_complex = generate_log_distribution(test_set_size)
test_set2 = identical_real_complex + 1j * identical_real_complex
test_set3 = generate_log_distribution(test_set_size) + 0j
test_set4 = 1j * generate_log_distribution(test_set_size) + 0
test_set = np.concatenate([test_set1, test_set2, test_set3, test_set4])
test_set_ref = [mpmath.log1p(c) for c in test_set]
@nb.vectorize(fastmath=True)
def log1p_acc(x):
if np.abs(x) >= 1e-3:
# np.log1p is accurate for large values of x
return np.log1p(x)
else:
terms = 4
x_pow = x
sign = 1
sum_ = 0
for i in range(1, terms + 1):
# Note: use (1 / i) to allow optimizer to avoid fdiv
sum_ += sign * x_pow * (1 / i)
sign *= -1
x_pow *= x
return sum_
functions = [
("NumPy log1p", np.log1p),
("SciPy log1p", sc.log1p),
("Custom log1p", log1p_acc),
]
for func_name, func in functions:
x = []
y = []
for i in range(len(test_set)):
number = test_set[i]
logged = func(number)
# Do error calculation in arbitrary precision
logged = mpmath.mpc(logged)
out = logged - test_set_ref[i]
out_abs = mpmath.fabs(out)
number_abs = mpmath.fabs(number)
relative_err = out_abs / mpmath.fabs(test_set_ref[i])
x.append(number_abs)
y.append(relative_err)
print(f"Report for {func_name}")
print(f"Average relative error: {float(np.mean(y))}")
# print("Worst", np.max(y), "for", test_set[np.argmax(y)])
print()
plt.scatter(x, y, label=func_name, s=1, alpha=0.1)
plt.xscale( log )
plt.yscale( log )
plt.xlabel( Norm of x )
plt.ylabel( Relative error in log1p(x) )
leg = plt.legend()
for lh in leg.legend_handles:
lh.set_alpha(1)
lh.set_sizes([20])
<>1> 请注意,将<代码>dps至1000,仅指以1,000位精确数字进行中间计算。 这并不一定意味着最终结果准确到千位数。 可能只有一半准确。 我没有核对数字稳定<代码>mpmath.log1p()。
Performance
我比较了这一功能相对于SciPy和NumPy版本的表现,使用了整个试验组的标志1p。 它比NumPy快2x,比SciPy快3x。 大约需要14个纳米秒评价。
法典:
print("NumPy log1p")
%timeit np.log1p(test_set)
print("SciPy log1p")
%timeit sc.log1p(test_set)
print("Custom log1p")
%timeit log1p_acc(test_set)
产出:
NumPy log1p
2.98 ms ± 63.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
SciPy log1p
3.69 ms ± 5.58 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Custom log1p
1.42 ms ± 2.74 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
我运用了以下想法来加快实现这一点:
- I began with the taylor series, and turned it into a loop. Instead of using exponentiation to calculate each term, I multiplied the previous term by x.
- Floating point division is usually slightly slower than floating point multiplication. For that reason, I rewrote
sign * x_pow / i
as sign * x_pow * (1 / i)
- As
terms
is a constant, Numba is smart enough to fully unroll this loop, which makes this as fast as writing the Taylor series out explicitly.
- I used
fastmath
, which allows Numba to re-arrange the order it does math in.
- I found that having a
sign
variable was slightly faster than selecting either -=
or +=
with the loop index.