使用四种IEEEE四环形环形双向浮动点操作,以圆至最接近的双链接方式实施
原标题:Double floating point operations using four IEEE rounding modes implemented in terms of round to nearest ties to even
I am looking to implement the RandomX Proof of Work hash function in JavaScript + WebAssembly. The README explains how this isn t possible:
Web mining is infeasible due to the large memory requirement and the lack of directed rounding support for floating point operations in both Javascript and WebAssembly.
Due to the pervasive use of repeatedly switching the rounding modes using a CFROUND instruction inside the bytecode and the fact that JavaScript has no ability to change the rounding mode or even use any other rounding mode other than ties to even, its impossible.
The CFROUND instruction manipulates the fprc abstract register inside the RandomX spec. fprc starts at zero, so round ties to even and is updated seemingly randomly on every instruction execution.
fprc rounding mode
0 roundTiesToEven
1 roundTowardNegative
2 roundTowardPositive
3 roundTowardZero
RandomX uses 5 properly rounded operations, add, sub, mul, div, and sqrt guaranteed to be properly rounded by the IEEE754 spec in double precision. My question is this. Soft float is not an option, I have access to floating point arithmetic only in round ties to even for those 5 operations. I need to find a way to emulate double precision arithmetic with all of the other rounding modes implemented in terms of one rounding mode for those 5 operations. This would be much faster than soft float.
Bruce at the Random ASCII blog had something to say:
If I understand correctly then the other three rounding modes will produce results that are either identical to round-to-nearest-even or one ULP away. If the round-to-nearest-even operation is exact then all rounding modes will give the same result. If it is not exact then you’d have to figure out which results would be different. I think that either round-towards-negative or round-towards-positive would be one ULP away, but not both.
FMA (fused multiply add) operations aren t available in JavaScript, however they are available in WebAssembly but gated behind non deterministic operations (which aren t widely supported). Those operations are f32x4.relaxed_*madd and f64x2.relaxed_*madd. If the use of FMA is required, this is fine, I can implement fallbacks.
The only paper that was even slightly close to floating point rounding mode emulation was Emulating round-to-nearest ties-to-zero ”augmented” floating-point operations using round-to-nearest ties-to-even arithmetic. It makes some interesting points but not useful.
Additional information 0: RandomX does operations using two classes of abstract floating point registers, group F (additive operations) and group E (multiplicative operations). According to the specification the absolute value of group F will never exceed 3.0e+14 and the approximate range of group E is 1.7E-77 to infinity (always positive). All results of floating point operations can NEVER be NaN. Answers may take this into account.
Additional information 1 (address @sech1 comment): The FSCAL_R instruction complicates things. Executing the instruction efficiently requires manipulating the actual floating point binary format:
The mathematical operation described above is equivalent to a bitwise XOR of the binary representation with the value of 0x80F0000000000000.
Which means alternative forms of representation (double-double, fixed point) are out of the picture:
Because the limited range of group F registers would allow the use of a more efficient fixed-point representation (with 80-bit numbers), the FSCAL instruction manipulates the binary representation of the floating point format to make this optimization more difficult.
There is a possibility that double-double arithmetic might work, if the FSCAL_R instruction could be splayed among both doubles.
Additional information 2 (address @aka.nice @sech1):
All floating point operations are rounded according to the current value of the fprc register (see Table 4.3.1). Due to restrictions on the values of the floating point registers, no operation results in NaN or a denormal number.
I have begin performing calculations constraining all output numbers within their ranges for the operations being performed on group E and F registers (no denormals, no NaNs, etc), with results in answers that are off by a single bit or resulting in infinity.
I have taken @aka.nice s suggestion on implementing operations using two-product. The testing harness and failure/success output are here. An excerpt is placed below:
FAIL: [mul(FE_UPWARD)] truth(...) == 0x1.00c915254a87ep+968 != op(...) == 0x1.00c915254a87dp+968
mul(FE_UPWARD): 19/20
FAIL: [mul(FE_DOWNWARD)] truth(...) == 0x1.fffffffffffffp+1023 != op(...) == inf
FAIL: [mul(FE_DOWNWARD)] truth(...) == 0x1.00c915254a87dp+968 != op(...) == 0x1.00c915254a87cp+968
FAIL: [mul(FE_DOWNWARD)] truth(...) == 0x1.fffffffffffffp+1023 != op(...) == inf
FAIL: [mul(FE_DOWNWARD)] truth(...) == 0x1.fffffffffffffp+1023 != op(...) == inf
FAIL: [mul(FE_DOWNWARD)] truth(...) == 0x1.fffffffffffffp+1023 != op(...) == inf
FAIL: [mul(FE_DOWNWARD)] truth(...) == 0x1.fffffffffffffp+1023 != op(...) == inf
mul(FE_DOWNWARD): 14/20
FAIL: [div(FE_TOWARDZERO)] truth(...) == 0x1.71bb8430246b4p-58 != op(...) == 0x1.71bb8430246b3p-58
FAIL: [div(FE_TOWARDZERO)] truth(...) == 0x1.c254556a24a56p+73 != op(...) == 0x1.c254556a24a55p+73
FAIL: [div(FE_TOWARDZERO)] truth(...) == 0x1.a67a7c9cb8d59p+483 != op(...) == 0x1.a67a7c9cb8d58p+483
问题回答
I think double-word arithmetic may be a viable option.
Let a and b be positive, non-zero floating-point numbers. Consider the Error-Free Transform a+b -> x,y where x = RN(a+b) and y = (a+b)-x. (x and y can be computed, e.g., by the well-known 2Sum algorithm. 1) Assume that no under-/overflows occur. Then if y = 0, the sum is exact and all rounding modes give the same result for a+b. If y > 0, then rounding up, away from zero or toward +infinity gives nextafter(x,+INFINITY). Conversely, if y<0, rounding down, toward zero or -infinity gives nextafter(x,-INFINITY).
I think this specific example can be generalized to other operations (e.g., multiplication) and the full range of floating-point values. As has been said, there are many cases to consider. The edge cases will undoubtedly be particularly tedious.
A good reference: Jean-Michel Muller, Laurence Rideau. Formalization of double-word arithmetic, and comments on ”Tight and rigorous error bounds for basic building blocks of double-word arithmetic”. ACM Transactions on Mathematical Software, Association for Computing Machinery, 2022, 48 (1), pp.1-24. 10.1145/3484514. hal-02972245v2. Link
You can t get away without soft float entirely, but the only soft float operation you need is FMA: https://github.com/SChernykh/RandomX_OpenCL/blob/master/RandomX_OpenCL/CL/randomx_vm.cl#L1406 check "fma_soft", "div_rnd", "sqrt_rnd" functions.
Edit: but I think the better approach would be to use 2 floating point variables to double the precision: https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format#Double-double_arithmetic
https://www.youtube.com/watch?v=6OuqnaHHUG8
This way you ll have enough bits to determine which way to round in each rounding mode. But prepare to write a lot of "if" for all the edge cases - the "fma_soft" code above is monstrous for a reason.
Without FMA, you could use the two-product algorithm
Here is a sketch of solution using two-sum and two-product so as to find the residual rounding error.
This solution requires using some implementation of standard nextafter() function, which can be found from various places like https://github.com/scijs/nextafter/blob/master/README.md
Excuse my javascript skills which might be very approximative, but you ll get the idea.
const TO_NEAREST = 0;
const TO_NEGATIVE_INFINITY: = 1;
const TO_POSITIVE_INFINITY: = 2;
const TO_ZERO: = 3;
const AWAY_FROM_ZERO: = 4;
function split(a)
// helper function for two product, split a number intwo parts a=x+y
// each part having no more than 26 consecutive bits set
{
let splitter=1+Math.pow(2,26); // splitter of two-product algorithm for double precision
let c = factor*a; // BEWARE: might overflow for big a, but this case has been excluded a priori
let x = c - (c-a);
let y = a - x;
return [x,y];
}
function mul_residue(a,b,c)
// answer the residue of operation c=a*b using the two product algorithm
// could be expressed as fma(a,b,-c)
{
let [a1,a2] = split(a);
let [b1,b2] = split(b);
let r1 = c - (a1*b1);
let r2 = r1 - (a1*b2);
let r3 = r2 - (a2*b1);
let res = a2*b2 - r3;
return res;
}
function sum_residue(a,b,c)
// answer the residue of operation c=a+b using the two sum algorithm
{
let a1=c-b;
let b1=c-a;
let delta_a=a-a1;
let delta_b=b-b1;
let res=delta_a+delta_b;
return res;
}
function round_toward(c,res,mode)
// round the result c to upper or lower, depending on the residue and rounding mode
{
if(res==0.0) return c; // case of exact operation
switch( mode ) {
case TO_POSITIVE_INFINITY:
return (res>0) ? nextafter(c,Number.POSITIVE_INFINITY) : c;
case TO_NEGATIVE_INFINITY:
return (res<0) ? nextafter(c,Number.NEGATIVE_INFINITY) : c;
case TO_ZERO:
return ((res<0) == (c<0)) ? nextafter(c,-c-res) : c;
case AWAY_FROM_ZERO:
return ((res<0) == (c<0)) ? nextafter(c,c+c+res) : c;
}
return c;
}
function round_sum(a,b,mode)
{
let c=a+b;
if (mode == TO_NEAREST) return c;
res = sum_residue(a,b,c);
return round_toward(c,res,mode);
}
function round_sub(a,b,mode)
{
return round_sum(a,-b,mode);
}
function round_mul(a,b,mode)
{
let c=a*b;
if (mode == TO_NEAREST) return c;
let res=mul_residue(a,b,c);
return round_toward(c,res,mode);
}
function round_div(a,b,mode)
{
let c=a/b;
if (mode == TO_NEAREST) return c;
let res=mul_residue(c,b,a);
return round_toward(c,res,mode);
}
function round_sqrt(a,mode)
{
let c=Math.sqrt(a);
if (mode == TO_NEAREST) return c;
let res=mul_residue(c,c,a);
return round_toward(c,res,mode);
}
EDIT Above functions may fail in case of underflow/overflow or special values infinity and NaN.
For example, when the result of operation is denormal, we can t represent the residue in floating point since it would be smaller then the min denormalized float.
But since we only care of the sign of residue, we can just scale the numbers before calling mul_residue, with stdlib frexp/ldexp. Here is a sketch:
function mul_residue_scaled(a,b,c)
// answer residue of operation c=a*b with some scaling
// the scaling is here to avoid overflow/underflow conditions
{
let [a_scaled , expa] = frexp(a);
let [b_scaled , expb] = frexp(b);
let c_scaled = ldexp(c , expa+expb );
return mul_residue( a_scaled , b_scaled , c_scaled );
}
Of course, frexp and ldexp also have to be emulated...
Here is a candidate implementation.
Note that multiplying by a power of two is an exact operation, except when the result underflows, this ldexp emulation takes care of such conditions.
function exponent_bits(x)
// extract the 11 bits of float representation encoding the exponent part
{
let float = new Float64Array(1),
let bytes = new Uint8Array(float.buffer);
float[0] = x;
return ((bytes[7] & 0x7f) << 4 | bytes[6] >> 4);
}
function exponent(x)
// answer the exponent of x
// where non zero finite x can be represented as
// sign * significand * 2^exponent
// with significand in the interval (0.5,1(
// undefined behaviour for special values infinity and NaN
{
if(x === 0) return 0;
let e = exponent_bits(x);
if(e === 0) return exponent( x*Math.pow(2.0,53) ) - 53;
return e - 1022;
}
function frexp(x)
// emulate stdlib frexp function
// answer the number scaled by a power of two so that it s magnitude fall in interval (0.5,1(
// and the associated power
{
if( ! isFinite(x)) return [x,0];
if( x == 0 ) return [x,0];
let e = exponent(x);
return [ldexp(x,-e) , e];
}
function ldexp(x,n)
// emulate stdlib ldexp function
// scale a number by a power of two : x * 2^n
{
if( ! isFinite(x)) return x;
if( x == 0 ) return x;
if( n > 1023 ) return ldexp(ldexp(x,1023),n-1023);
if( n >= -1074) return x * Math.pow(2.0,n);
// care to avoid inexact result in case of underflow
let delta_to_underflow = Math.max( exponent(x) - 1022 , -1074 );
// handle case of total cancellation
if (delta_to_underflow >= 0) return ldexp(ldexp(x,-1022),n+1022);
ldexp(ldexp(x,delta_to_underflow),n-delta_to_underflow);
}
相关问题
Fastest method for running a binary search on a file in C?
For example, let s say I want to find a particular word or number in a file. The contents are in sorted order (obviously). Since I want to run a binary search on the file, it seems like a real waste ...
开放源C/C++ 3d 提供器(在3ds 最大模型的支持下)
最好、最小、最快、开放的来源、C/C++ 3d 提供方(在3ds max模型的支持下),而不是通用公平市价,
Print possible strings created from a Number
Given a 10 digit Telephone Number, we have to print all possible strings created from that. The mapping of the numbers is the one as exactly on a phone s keypad.
i.e. for 1,0-> No Letter
for 2->...
Tips for debugging a made-for-linux application on windows?
I m trying to find the source of a bug I have found in an open-source application.
I have managed to get a build up and running on my Windows machine, but I m having trouble finding the spot in the ...
Trying to split by two delimiters and it doesn t work - C
I wrote below code to readin line by line from stdin ex.
city=Boston;city=New York;city=Chicago
and then split each line by ; delimiter and print each record.
Then in yet another loop I try to ...
Good, free, easy-to-use C graphics libraries? [closed]
I was wondering if there were any good free graphics libraries for C that are easy to use?
It s for plotting 2d and 3d graphs and then saving to a file. It s on a Linux system and there s no gnuplot ...
DDD Alternative that also Draws Pretty Pictures of Data Structures
Is there anything other than DDD that will draw diagrams of my data structures like DDD does that runs on Linux?
ddd is okay and runs, just kind of has an old klunky feeling to it, just wanted to ...
Encoding, decoding an integer to a char array
Please note that this is not homework and i did search before starting this new thread. I got Store an int in a char array?
I was looking for an answer but didn t get any satisfactory answer in the ...