English 中文(简体)
固定点数学的超支
原标题:Overflows in fixed point math

作为学习工作的一部分,“霍比”项目,我正在使用固定点数,对“ARV”的Curey-Tukey FFT算法进行自我解释。 我没有谈到以前固定的数学点,并想知道如何最好地完成部分执行工作。 我猜测这个问题的幽灵是要求确认我正确思考有关问题。

The heart of the C-T algorithm involves a set of multiplications and additions on complex-valued data in the following fashion (in pseduocode):

temp1 = cosine(increment)*dataRealPart[increment1] 
-sine(increment)*dataImaginaryPart[increment1]

temp2 = cosine(increment)*dataImaginaryPart[increment1] 
+ sine(increment)*dataRealPart[increment1]

dataRealPart[increment1] = dataRealPart[increment2] - temp1

etc.

The cosine and sine data will be 8 bit signed binary fractions of the form S.XXX XXXX, the input data will also be 8 bit signed binary fractions of the form SXXX.XXXX, and the multiplications will generate a 16 bit signed fraction product. As I see it, for particularly "bad" values of sine and cosine and the real and imaginary part of the data, temp1 or temp2 will come pretty close to the limits of a 16 bit signed integer.
If both the real part and imaginary part of the data are, say, b0111.1111, a little work in Wolfram Alpha shows that, with "bad" values of sine and cosine, the output can be up to 1.4 times larger than what simply multiplying the maximum value of a sine times the maximum value of the input would be.

举例来说,如果理论论点为0111.1111,投入价值为0111.111,则产出为01111.001001,即16129英亩。 页: 1 这吸引了已签署16条轨道星号的积极范围,但下行,这些产品的添加和从投入数据中减去,假定此处的输入数据转换为16条轨道,就有可能出现溢出。

看起来像贸易界一样:要么加强内部处理数据的解决,从而增加处理时间,要么确保投入数据保持在低于造成超支、信号与噪音比率下降的振幅之下。 这是否涉及事情的规模?

最佳回答

一种选择是将你的ine和 co价值降低到Q6(6比mal权)。 这将使它们+/64。 通知说,通过提高精度,1比1可代表,但+1不是(即+128)。 此外,在你乘数后,将有两个标码,这意味着可能增加两个产品,没有任何问题。 由于决议的减少,你确实能够避免超支。 另一点——如果你的复杂价值限于1级(real*real+img*img <=1),那么所得出的数额将不超过1,而不是你发现的1.4项数字,因为只有1英亩。 您基本上依靠的是单位病媒(cos,sin)和复杂病媒的副产品。 除此之外,在添加这些产品之前,你可以把其16个轨道产品换成几比,或许是用四舍五入的方式,因为数据和三角功能都四舍五入到7个轨道。

最后一点。 如果你再拿许多数字,你知道结果总是在你所代表的范围内,那么你就不必担心中间结果的溢出。 归根结底,它只是想做的(你把所有金额推向零的界限)。

问题回答

我很早就知道这一点,但我所写的图书馆在固定点数上有一些说明。

Overview of Using Fixed Point Math Key ideas that come up in the implementation of quantized or DSP type math areas include how to represent fractional units and what to do when math operations cause integer registers to overflow.

抽取点(C,C++的借方或双型)通常用来代表零件单位,但这些单位并非总能得到(无硬件支持,或低端微观控制器没有图书馆软件支持)。 如果无法提供硬件支持,则可以通过软件图书馆提供浮动点,但这些图书馆的大小一般比硬件的安装要慢。 如果方案管理员是cl,则可以使用惯性数学,而不是软件浮动点,导致代码更加快捷。 在这些情况下,如果采用一个递减系数,则分单位可以有户籍登记(短、中、长)。

Scaling and Wrap Around Two common issues that come up in creating scaled-integer representations of numbers are

缩略语——方案管理员必须逐手跟踪微小幅递减系数,而在使用浮动点号码时,汇编者和浮点图书馆对方案管理员来说就是这样。

周围的溢出和瓦拉布——如果加上两个大数目的话,结果可能大于对愤怒的表述,从而造成包装或溢出错误。 更糟糕的是,这些错误可能根据汇编者警报环境或行动类型而不予解释。 例如,有2个8个轨道编号(通常是C/C++的果园)。

char a = 34, b= 3, c;

//and compute

c=a*b;

//Multiply them together and the result is 102, which still is fits in a 8 bit result. But //what happens if b = 5?

c=a*b;实际答案为170,但结果为:86

这类错误称为超支或围绕错误进行总结,可以用几种方式处理。 我们可以使用诸如短时间或帐篷等较大的分类类型,或者我们可以事先测试数字,看看结果是否会产生超支。 哪些是更好的? 它取决于具体情况。 如果我们已经使用最大的自然类型(例如32倍频频频频的32倍频频),那么我们可能不得不在开展行动之前测试其价值,即使这样做会带来一些暂时的绩效处罚。

Loss of Precision True floating point representations can maintain a relatively arbitrary set of precision across a range of operations, however when using fixed point (scaled) math the process of scaling means we end up dedicating a portion of the available bits to the precision we desire. This means that mathematical information that is smaller than the scaling factor will be lost causing quantization error.

A. 简便车道测量 让我们努力代表第10.25号,进行愤怒,我们可以做一些简单的事,如将价值乘以100,并储存结果,形成一种分类变数。 因此,我们:

10.25 * 100 ==> 1025

如果我们想要增加另一个数字,就把它说0.55,那么我们就会拿到1.55,把这个数目推到100。 ......

0.55 * 100=>155

现在,加上这些数字,我们添加了分类数字。

1025+55 => 1070

但是,用一些法典来审视这一点:

void main (void)
{
    int myIntegerizedNumber  = 1025;
    int myIntegerizedNumber2 =  155;
    int myIntegerizedNumber3;

    myIntegerizedNumber3 = myIntegerizedNumber1 + myIntegerizedNumber2;

    printf("%d + %d = %d
",myIntegerizedNumber1,myIntegerizedNumber2,myIntegerizedNumber3);
}

但现在是几个挑战中的第一个挑战。 我们如何处理ger和零件? 我们如何展示成果? 没有汇编者的支持,我们作为方案者必须区分分类和分数结果。 上述方案将结果印成1070份而不是10.70份,因为汇编者只知道我们没有打算扩大定义的分类变量。

Thinking in powers of 2 (radixes) In the previous example we used base10 math, which while useful for humans is not an optimal use of bits as all the numerics in the machine will be using binary math. If we use powers of two we can specify the precision in terms of bits instead of base10 for the fractional and integer parts and we get several other advantages:

Ease of notation - for example with a 16 bit signed integer (typically a short in C/C++), we can say the number is "s11.4" which means its a number that is signed with 11 bits of integer and 4 bits of fractional precision. In fact one bit is not used for a sign representation but the number is represented as 2 s complement format. However effectively 1 bit is used for sign representation from the point of precision represented. If a number is unsigned, then we can say its u12.4 - yes the same number now has 12 integer bits of precision and 4 bits of fractional representation. If we were to use base10 math no such simple mapping would be possible (I won t go in to all the base10 issues that will come up). Worse yet many divide by 10 operations would need to be performed which are slow and can result in precision loss.

Ease of changing radix precision. Using base2 radixes allows us to use simple shifts (<< and >>) to change from integer to fixed-point or change from different fixed point representations. Many programmers assume that radixes should be a byte multiple like 16bits or 8 bits but actually using just enough precision and no more (like 4 or 5 bits for small graphics apps) can allow much larger headroom because integer portion will get the remaining bits. Suppose we need a range of +/-800 and a precision of 0.05 (or 1/20 in decimal). We can fit this in to 16bit integer as follows. First one bit is allocated to sign. This leaves 15 bits of resolution. Now we need 800 counts of range. Log2(800) = 9.64.. So we need 10 bits for the integer dynamic range. Now lets look at the fractional precision, we need log2(1/(0.05))= 4.32 bits which rounded up is 5 bits. So we could, in this application use a fixed radix of s10.5 or signed 10bit integer and 5 bit fractional resolution. Better yet it still fits in a 16 bit integer. Now there are some issues: while the fractional precision is 5 bits (or 1/32 or about 0.03125) which is better than the requirement of 0.05, it is not identical and so with accumulated operations we will get quantization errors. This can be greatly reduced by moving to a larger integer and radix (e.g. 32bit integer with more fractional bits) but often this is not necessary and manipulating 16 bit integers is much more efficient in both compute and memory for smaller processors. The choice of these integer sizes etc should be made with care. A few notes on fixed point precision When adding fixed point numbers together its important to align their radix points (e.g. you must add 12.4 and 12.4 numbers or even 18.4 + 12.4 + 24.4 will work where the integer portion represents the number of bits in use not the physical size of the integer register declared). Now, and here begins some of the trickyiness, the result, purely speaking is a 13.4 number! But this becomes 17 bits - so you must be careful to watch for overflow. One way is to put the results in a larger 32 bit wide register of 28.4 precision. Another way is to test and see if the high order bits of either of the operands is actually set - if not then the numbers can be safely added despite the register width precision limitations. Another solution is to use saturating math - if the result is larger than can be put in the register we set the result to the maximum possible value. Also be sure to watch out for sign. Adding two negative numbers can cause wrap around just as easy as two positive numbers.

在设计固定的圆形管道方面,一个令人感兴趣的问题正在跟踪现有精度的实际使用程度。 您可使用 象四舍五入变迁这样的行动尤其如此,这种行动可能导致一些阵列细胞具有相对较大的价值,而另一些则具有接近零的数学能量。

一些规则:添加 2兆位数得出M+1轨精度结果(不测试过量)

单方位数乘以百万分之数,得出N+M轨道精确结果(不测试过量)。

Saturation can be useful in some circumstance but may result in performance hits or loss of data.

Adding... When adding or subtracting fixed radix numbers the radix points must be aligned beforehand. For example: to add a A is a s11.4 number and B is a 9.6 number. We need to make some choices. We could move them to larger registers first, say 32 bit registers. resulting in A2 being a s27.4 number and B2 being a s25.6 number. Now we can safely shift A2 up by two bits so that A2 = A2<<2. Now A2 is a s25.6 number but we haven t lost any data because the upper bits of the former A can be held in the larger register without precision loss.

现在我们可以补充这些内容,并取得结果C=A2+B2。 结果是25.6个数字,但精度实际上为12.6个(我们使用的是来自A的较大的分类部分,即11个比值和较大的分部分,这些部分来自B,即6个比值,加上1个比值增加业务)。 因此,这一12.6号的精确度为18+1,而没有准确损失。 但是,如果我们需要将其改为16倍的精确数字,我们将需要就保持多少分数的准确性作出选择。 最简单的办法是保存所有分类账,这样我们就把标记和分类账的借项划入16个轨道登记册。 因此,C=C>>3得出12.3个数字。 只要我们跟踪六分点,我们就会保持准确性。 现在,如果我们事先测试了A和B,我们可能已经认识到,我们可以保留更多的分界线。

Multiplying... Multiplying does not require that the radix points be aligned before the operation is carried out. Lets assume we have two numbers as we had in our Adding example. A is a s11.4 precision number which we move to A2 now a s27.4 large register (but still a s11.4 number of bits are in use). B is a s9.6 number which we move to B2 now a s25.6 large registor (but still a s9.6 number of bits are in use). C=A2*B2 results in C being a s20.10 number. note that C is using the entire 32 bit register for the result. Now if we want the result squished back in to a 16 bit register then we must make some hard choices. Firstly we already have 20 bits if integer precision - so any attempt (without looking at the number of bits actually used) to fit the result in to a 16 bit regist must result in some type of truncation. If we take the top 15 bits of the result (+1 bit for sign precision) we the programmers must remember that this scaled up by 20-5 = 5 bits of precision. So when even though we can fit the top 15 bits in a 16 bit register we will lose the bottom 16 bits of precision and we have to remember that the result is scaled by 5 bits (or integer 32). Interestingly if test both A and B beforehand we may find that while they had the stated incoming precion by wrote they may not actually contain that number of live, set bits (e.g. if A is a s11.4 number by programmer s convention but its actual value is integer 33 or fixed-radix 2and1/16 then we may not have as many bits to truncate in C).

(这里还有图书馆:https://github.com/deftio/fr_math)





相关问题
Maths in LaTex table of contents

I am trying to add a table of contents for my LaTex document. The issue I am having is that this line: subsubsection{The expectation of (X^2)} Causes an error in the file that contains the ...

Math Overflow -- Handling Large Numbers

I am having a problem handling large numbers. I need to calculate the log of a very large number. The number is the product of a series of numbers. For example: log(2x3x66x435x444) though my actual ...

Radial plotting algorithm

I have to write an algorithm in AS3.0 that plots the location of points radially. I d like to input a radius and an angle at which the point should be placed. Obviously I remember from geometry ...

Subsequent weighted algorithm for comparison

I have two rows of numbers ... 1) 2 2 1 0 0 1 2) 1.5 1 0 .5 1 2 Each column is compared to each other. Lower values are better. For example Column 1, row 2 s value (1.5) is more ...

热门标签