English 中文(简体)
x64 业绩与x86比较
原标题:x64 performance compared to x86

我在C++中写了这一微薄的方案,以便检查CPU的负荷情况。

#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <windows.h>
#include <time.h>
int main()
{

    double x = 1;
    int t1 = GetTickCount();
    srand(10000);

    for (unsigned long i = 0; i < 10000000; i++)
    {
        int r = rand();
        double l = sqrt((double)r);
        x *= log(l/3) * pow(x, r);
    }

    int t2 = GetTickCount();
    printf("Time: %d
", t2-t1);
    getchar();
}

I compiled it both for x86 and for x64 on win7 x64.
For some reason when I ran the x64 version it finished running in about 3 seconds
but when I tried it with the x86 version it took 48 (!!!) seconds.
I tried it many times and always got similar results.
What could cause this difference?

最佳回答

考察组群的输出与/Ox(最大优化),发现x86和x64之间的速度差异:

; cl /Ox /Fa tick.cpp
; x86 Line 17: x *= log(l/3) * pow(x, r)
fld     QWORD PTR _x$[esp+32]
mov     eax, esi
test    esi, esi
; ...

我们看到,这项计算使用了x87指示。 相比之下,X64号建筑:

; cl /Ox /Fa tick.cpp
; x64 Line 17: x *= log(l/3) * pow(x, r)
movapd  xmm1, xmm8
mov     ecx, ebx
movapd  xmm5, xmm0
test    ebx, ebx
; ...

我们现在看到,使用SSE指令。

页: 1 “2010年视觉演播室”制作类似指示,但“64位轨道汇编者”仅生产“、更好的faster组件,供你手。

Finally, if you relax the floating point model the x86 and x64 perform nearly identically.

时间选择,最不敏感的是:

  • x86, /Ox: 22704 ticks
  • x64, /Ox: 822 ticks
  • x86, /Ox /arch:SSE2: 3432 ticks
  • x64, /Ox /favor:INTEL64: 1014 ticks
  • x86, /Ox /arch:SSE2 /fp:fast: 834 ticks
问题回答

其原因确实与特别安全局有关。 在VS中,64个轨道释放是在违约时生成SSE2指示的,但必须使用<代码>/arch:SSE2的开关,使SSE2。 当你这样做时,你有32个和64个轨道的可比运行时间。

Many of the possibilities here have little or nothing to do with x86 vs. x64. One obvious possibility is that most (all?) compilers use SSE for floating point, where most normally use 8087-style instructions in x86 mode. Since your code is heavy on floating point, this could make a significant difference.

Another possibility is that in the process or rewriting for x64, they noticed/fixed some problems in their code generator that let it produce substantially better code, at least under certain circumstances.

虽然它不像此处那样看待,但有些法典也大大得益于64个轨道上登记的面积和/或数量的增加。

Part of it is definitely the SSE, but there s a huge reason why x64 uses SSE mode: all AMD64 CPUs are required to have SSE2. Another part could also be the increased register count





相关问题
Undefined reference

I m getting this linker error. I know a way around it, but it s bugging me because another part of the project s linking fine and it s designed almost identically. First, I have namespace LCD. Then I ...

C++ Equivalent of Tidy

Is there an equivalent to tidy for HTML code for C++? I have searched on the internet, but I find nothing but C++ wrappers for tidy, etc... I think the keyword tidy is what has me hung up. I am ...

Template Classes in C++ ... a required skill set?

I m new to C++ and am wondering how much time I should invest in learning how to implement template classes. Are they widely used in industry, or is this something I should move through quickly?

Print possible strings created from a Number

Given a 10 digit Telephone Number, we have to print all possible strings created from that. The mapping of the numbers is the one as exactly on a phone s keypad. i.e. for 1,0-> No Letter for 2->...

typedef ing STL wstring

Why is it when i do the following i get errors when relating to with wchar_t? namespace Foo { typedef std::wstring String; } Now i declare all my strings as Foo::String through out the program, ...

C# Marshal / Pinvoke CBitmap?

I cannot figure out how to marshal a C++ CBitmap to a C# Bitmap or Image class. My import looks like this: [DllImport(@"test.dll", CharSet = CharSet.Unicode)] public static extern IntPtr ...

Window iconification status via Xlib

Is it possible to check with the means of pure X11/Xlib only whether the given window is iconified/minimized, and, if it is, how?

热门标签