广场根基比使用<代码>-O2的添加速度要低4倍左右,如果不使用<代码>-O2,则要低13倍左右。 在净额的其他地方,我发现50-100个周期的估计数可能是真实的,但并不是一个非常有用的相对成本衡量标准,因此,我对以下代码进行了重新表述,以便进行相对测量。 让我知道,你是否看到了试验守则的任何问题。
下面的代码是视窗7操作系统下的英特尔核心i3,并在DevC++(使用海合会)汇编。 你的里程可能有所不同。
#include <cstdlib>
#include <iostream>
#include <cmath>
/*
Output using -O2:
1 billion square roots running time: 14738ms
1 billion additions running time : 3719ms
Press any key to continue . . .
Output without -O2:
10 million square roots running time: 870ms
10 million additions running time : 66ms
Press any key to continue . . .
Results:
Square root is about 4 times slower than addition using -O2,
or about 13 times slower without using -O2
*/
int main(int argc, char *argv[]) {
const int cycles = 100000;
const int subcycles = 10000;
double squares[cycles];
for ( int i = 0; i < cycles; ++i ) {
squares[i] = rand();
}
std::clock_t start = std::clock();
for ( int i = 0; i < cycles; ++i ) {
for ( int j = 0; j < subcycles; ++j ) {
squares[i] = sqrt(squares[i]);
}
}
double time_ms = ( ( std::clock() - start ) / (double) CLOCKS_PER_SEC ) * 1000;
std::cout << "1 billion square roots running time: " << time_ms << "ms" << std::endl;
start = std::clock();
for ( int i = 0; i < cycles; ++i ) {
for ( int j = 0; j < subcycles; ++j ) {
squares[i] = squares[i] + squares[i];
}
}
time_ms = ( ( std::clock() - start ) / (double) CLOCKS_PER_SEC ) * 1000;
std::cout << "1 billion additions running time : " << time_ms << "ms" << std::endl;
system("PAUSE");
return EXIT_SUCCESS;
}