English 中文(简体)
计算广场根基的缓慢(多少周期)?
原标题:How slow (how many cycles) is calculating a square root?

计算广场根基的缓慢(多少周期)? 这是在一种分子动力学过程中出现的,即效率很重要,没有必要的当地根源对算法的操作时间产生了明显的影响。

问题回答

,该表将SSE的绩效与FSQRT、外国直接投资V、FMUL和FADD进行比较时,是平等的,但比较快,因为它能够提供80美术。 SSE有近似对等和近似对等。

核心2 45nm,FSQRT和外国直接投资基础迅速发展,而FLDD和FMUL则有变化。 同样,SSE的表现也是如此。

Intel Core 2 (Merom, 65nm)

Instruction Operands Latency Reciprocal
throughput
FSQRT 6 - 69
FADD(P) r 3 1
FMUL(P) r 5 2
FDIV(R)(P) r 6 - 38 d 5 - 37 d
ADDSS/D xmm, xmm 3 1
ADDPS/D xmm, xmm 3 1
MULSS xmm, xmm 4 1
MULSD xmm, xmm 5 1
MULPS xmm, xmm 4 1
MULPD xmm, xmm 5 1
DIVSS xmm, xmm 6 - 18 d 5 - 17 d
DIVSD xmm, xmm 6 - 32 d 5 - 31 d
DIVPS xmm, xmm 6 - 18 d 5 - 17 d
DIVPD xmm, xmm 6 - 32 d 5 - 31 d
SQRTSS/PS xmm, xmm 6 - 29 6 - 29
SQRTSD/PD xmm, xmm 6 - 58 6 - 58
RSQRTSS/PS xmm, xmm 3 2

Intel Core 2 (Wolfdale, 45nm)

Instruction Operands Latency Reciprocal
throughput
FSQRT 6 - 20
FADD(P) r 3 1
FMUL(P) r 5 2
FDIV(R)(P) r 6 - 21 d 5 - 20 d
ADDSS/D xmm, xmm 3 1
ADDPS/D xmm, xmm 3 1
MULSS xmm, xmm 4 1
MULSD xmm, xmm 5 1
MULPS xmm, xmm 4 1
MULPD xmm, xmm 5 1
DIVSS xmm, xmm 6 - 13 d 5 - 12 d
DIVSD xmm, xmm 6 - 21 d 5 - 20 d
DIVPS xmm, xmm 6 - 13 d 5 - 12 d
DIVPD xmm, xmm 6 - 21 d 5 - 20 d
SQRTSS/PS xmm, xmm 6 - 13 5 - 12
SQRTSD/PD xmm, xmm 6 - 20 5 - 19
RSQRTSS/PS xmm, xmm 3 2

The figures in the instruction tables represent the results of my measurements rather than the official values published by microprocessor vendors. Some values in my tables are higher or lower than the values published elsewhere.

必要性:这是指示在依赖链中产生的拖延。 这些数字是最低值。 砍刀、误导和例外可能会大大增加24小时的计数。 点歌剧算是正常的。 登机号码、NAN和无限数量大大增加了拖延,但XMM运动、sh子和Bolean指示除外。 流动点过多、流入不足、衰减或净捐助国的结果也造成了类似的拖延。 所用时间单位是核心锁周期,而不是时间标记柜的参照锁周期。

互惠产出: 每一教学的核心24小时周期平均数量相同。

d Round divisors or low precision give low values.

广场根基比使用<代码>-O2的添加速度要低4倍左右,如果不使用<代码>-O2,则要低13倍左右。 在净额的其他地方,我发现50-100个周期的估计数可能是真实的,但并不是一个非常有用的相对成本衡量标准,因此,我对以下代码进行了重新表述,以便进行相对测量。 让我知道,你是否看到了试验守则的任何问题。

下面的代码是视窗7操作系统下的英特尔核心i3,并在DevC++(使用海合会)汇编。 你的里程可能有所不同。

#include <cstdlib>
#include <iostream>
#include <cmath>

/*
Output using -O2:

1 billion square roots running time: 14738ms

1 billion additions running time   : 3719ms

Press any key to continue . . .

Output without -O2:

10 million square roots running time: 870ms

10 million additions running time   : 66ms

Press any key to continue . . .

Results:

Square root is about 4 times slower than addition using -O2,
            or about 13 times slower without using -O2
*/

int main(int argc, char *argv[]) {

    const int cycles = 100000;
    const int subcycles = 10000;

    double squares[cycles];

    for ( int i = 0; i < cycles; ++i ) {
        squares[i] = rand();
    }

    std::clock_t start = std::clock();

    for ( int i = 0; i < cycles; ++i ) {
        for ( int j = 0; j < subcycles; ++j ) {
            squares[i] = sqrt(squares[i]);
        }
    }

    double time_ms = ( ( std::clock() - start ) / (double) CLOCKS_PER_SEC ) * 1000;

    std::cout << "1 billion square roots running time: " << time_ms << "ms" << std::endl;

    start = std::clock();

    for ( int i = 0; i < cycles; ++i ) {
        for ( int j = 0; j < subcycles; ++j ) {
            squares[i] = squares[i] + squares[i];
        }
    }

    time_ms = ( ( std::clock() - start ) / (double) CLOCKS_PER_SEC ) * 1000;

    std::cout << "1 billion additions running time   : " << time_ms << "ms" << std::endl;

    system("PAUSE");
    return EXIT_SUCCESS;
}




相关问题
What to look for in performance analyzer in VS 2008

What to look for in performance analyzer in VS 2008 I am using VS Team system and got the performance wizard and reports going. What benchmarks/process do I use? There is a lot of stuff in the ...

SQL Table Size And Query Performance

We have a number of items coming in from a web service; each item containing an unknown number of properties. We are storing them in a database with the following Schema. Items - ItemID - ...

How to speed up Visual Studio 2008? Add more resources?

I m using Visual Studio 2008 (with the latest service pack) I also have ReSharper 4.5 installed. ReSharper Code analysis/ scan is turned off. OS: Windows 7 Enterprise Edition It takes me a long time ...

Manually implementing high performance algorithms in .NET

As a learning experience I recently tried implementing Quicksort with 3 way partitioning in C#. Apart from needing to add an extra range check on the left/right variables before the recursive call, ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

热门标签