English 中文(简体)
使用近地物体N的矢量浮点圆环
原标题:Vectorized floating point rounding using NEON
I ve got a NEON register filled with float32. I d like to round them to the nearest integer without having to transfer back to the main CPU. The NEON instructions to convert float32 to uint32 simply truncate, so e.g. 39.7 becomes 39, not 40. I don t care much about how 0.5 gets handled -- round away from zero or round to even both work for me. The best path I can see to implement rounding is to convert to int32 (thus truncating) convert back to float32 add 1 to the int32, convert back to float32, and set aside in case we re rounding up subtract compare to 0.5 (no need for abs value since I know in my case they ll all be positive) select truncated or truncated + 1 based on the comparison outcome That seems ugly, slow, and complicated. Is there a cleaner, faster, simpler, saner way?
最佳回答
Add .5 and convert to integer. If you want the result in floating-point format, convert back. Since you know the numbers are all positive, another option is to add 0x1p23 and subtract 0x1p23. The result of adding 0x1p23 is at least 0x1p23, so the float result has no bits with value less than one, so it must have been rounded to an integer. Then subtracting 0x1p23 subtracts the value that was added, leaving only the effect of rounding. Update: This second method fails if the input is in [0x1p47, 0x1p48) and its low bit is one. Then 0x1p23 is half the ULP of the input, so the addition causes rounding upward (to even), and the subtraction has no effect. I think there is a modification to fix that, but I do not have it at hand.
问题回答
Float to int round needs to add or subtract 0.5 depending on the positive or negtive, we knew. In Neon, 1. I can extract the signed of value; 2. bit-or with 0.5, then 0.5 has the sign; 3. add signed 0.5 with origin value: // 1. extract sign of origin value int32x4_t reinterpretInt = vreinterpretq_s32_f32(inputFloat); int32x4_t signExtract = vdupq_n_s32(-2147483648); int32x4_t signSignal = vandq_s32(reinterpretInt, signExtract); // 2. bit-or with 0.5 with origin value float32x4_t roundValue = vdupq_n_f32(0.5); float32x4_t plusValue = vreinterpretq_f32_s32(vorrq_s32(vreinterpretq_s32_f32(roundValue), signSignal)); // 3. add signed return vaddq_f32(inputFloat, plusValue);




相关问题
Haskell minimum/maximum Double Constant

Is there any way in Haskell to get the constant that is the largest and smallest possible positive rational number greater than zero that can be represented by doubles?

integer automatically converting to double but not float

I have a function like below: void add(int&,float&,float&); and when I call: add(1,30,30) it does not compile. add(1,30.0,30.0) also does not compile. It seems that in both cases, it ...

Lower Bounds For Floating Points

Are there any lower bounds for floating point types in C? Like there are lower bounds for integral types (int being at least 16 bits)?

Floating point again

Yesterday I asked a floating point question, and I have another one. I am doing some computations where I use the results of the math.h (C language) sine, cosine and tangent functions. One of the ...

热门标签