English 中文(简体)
Fast, Vectorizable method of taking floating point number modulus of special primes?
原标题:

Is there a fast method for taking the modulus of a floating point number?

With integers, there are tricks for Mersenne primes, so that its possible to calculate y = x MOD 2^31-1 without needing division. integer trick

Can any similar tricks be applied for floating point numbers?

Preferably, in a way that can be converted into vector/SIMD operations, or moved into GPGPU code. This rules out using integer calculations on the floating point data.

The primes I m interested in would be 2^7-1 and 2^31-1, although if there are more efficient ones for floating point numbers, those would be welcome.

One intended use of this algorithm would be to calculate a running "checksum" of input floating point numbers as they are being read into an algorithm. To avoid taking up too much of the calculation capability, I d like to keep this lightweight.

Apparently a similar technique is used for larger numbers, particularly 2^127 - 1. Unfortunately, the math in the paper is beyond me, and I haven t been able to figure out how to convert it to smaller primes.
Example of floating point MOD 2^127 - 1 - HASH127

问题回答

I looked at djb s paper, and you have it easier, since 31 bits fits comfortably into the 53-bit precision double significand. Assuming that your checksum consists of some ring operations over Z/(2**31 - 1), it will be easier (and faster) to solve the relaxed problem of computing a small representative of x mod Z/(2**31 - 1); at the end, you can use integer arithmetic to find a canonical one, which is slow but shouldn t happen too often.

The basic reduction step is to replace an integer x = y + 2**31 * z with y + z. The trick that djb uses is to compute w = (x + L) - L, where L is a large integer carefully chosen to provoke roundoff in such a way that z = 2**-31 * w. Then compute y = x - w and output y + z, which will have magnitude at most 2**32. (I apologize if this operation isn t quite enough; if so, please post your checksum algorithm.)

The choice of L involves knowing how precise the significand is. For the modulus 2**31 - 1, we want the unit of least precision (ulp) to be 2**31. For doubles in the range [1.0, 2.0), the ulp is 2**-52, so L should be 2**52 * 2**31. If you were doing this with the modulus 2**7 - 1, then you d take L = 2**52 * 2**7. As djb notes, this trick depends crucially on intermediate results not being computed in higher precision.





相关问题
How to add/merge several Big O s into one

If I have an algorithm which is comprised of (let s say) three sub-algorithms, all with different O() characteristics, e.g.: algorithm A: O(n) algorithm B: O(log(n)) algorithm C: O(n log(n)) How do ...

Grokking Timsort

There s a (relatively) new sort on the block called Timsort. It s been used as Python s list.sort, and is now going to be the new Array.sort in Java 7. There s some documentation and a tiny Wikipedia ...

Manually implementing high performance algorithms in .NET

As a learning experience I recently tried implementing Quicksort with 3 way partitioning in C#. Apart from needing to add an extra range check on the left/right variables before the recursive call, ...

Print possible strings created from a Number

Given a 10 digit Telephone Number, we have to print all possible strings created from that. The mapping of the numbers is the one as exactly on a phone s keypad. i.e. for 1,0-> No Letter for 2->...

Enumerating All Minimal Directed Cycles Of A Directed Graph

I have a directed graph and my problem is to enumerate all the minimal (cycles that cannot be constructed as the union of other cycles) directed cycles of this graph. This is different from what the ...

Quick padding of a string in Delphi

I was trying to speed up a certain routine in an application, and my profiler, AQTime, identified one method in particular as a bottleneck. The method has been with us for years, and is part of a "...

热门标签