English 中文(简体)
Benchmarks for Intel C++ compiler and GCC
原标题:

I have an AMD Opteron server running CentOS 5. I want to have a compiler for a fairly large C++ Boost based program. Which compiler I should choose?

最佳回答

I hope this helps more than hurts :)

I did a little compiler shootout sometime over a year ago, and I am going off memory.

  1. GCC 4.2 (Apple)
  2. Intel 10
  3. GCC 4.2 (Apple) + LLVM

I tested multiple template heavy audio signal processing programs that I d written.

Compilation times: The Intel compiler was by far the slowest compiler - more than 2x times slower as another posted cited.

GCC handled deep templates very well in comparison to Intel.

The Intel compiler generated huge object files.

GCC+LLVM yielded the smallest binary.

The generated code may have significant variance due to the program s construction, and where SIMD could be used.

For the way I write, I found that GCC + LLVM generated the best code. For programs which I d written before I took optimization seriously (as I wrote), Intel was generally better.

Intel s results varied; it handled some programs far better, and some programs far worse. It handled raw processing very well, but I give GCC+LLVM the cake because when put into the context of a larger (normal) program... it did better.

Intel won for out of the box, number crunching on huge data sets.

GCC alone generated the slowest code, though it can be as fast with measurement and nano-optimizations. I prefer to avoid those because the wind may change direction with the next compiler release, so to speak.

I never measured poorly written programs in this test (i.e. results outperformed distributions of popular performance libraries).

Finally, the programs were written over several years, using GCC as the primary compiler in that time.

Update: I was also enabling optimizations/extensions for Core2Duo. The programs were clean enough to enable strict aliasing.

问题回答

There is an interesting PDF here which compares a number of compilers.

The MySQL team posted once that icc gave them about a 10% performanct boost over gcc. I ll try to find the link.

In general I ve found that the native compilers perform better than gcc on their respective platforms

edit: I was a little off. Typical gains were 20-30% not 10%. Some narrow edge cases got a doubling of performance. http://www.mysqlperformanceblog.com/files/presentations/LinuxWorld2004-Intel.pdf

I suppose it varies depending on the code, but with the codebase I am working on now, ICC 11.035 gives an almost 2x improvement over gcc 4.4.0 on a Xeon 5504.

icc options: -O2 -fno-alias
gcc options: -O3 -msse3 -mfpmath=sse -fargument-noalias-global

The options are specific to just the file containing the compute-intensive code, where I know there is no aliasing. Single-threaded code with a 5-level nested loop.

Although autovectorization is enabled, neither compilers generate vectorized code (not a fault of the compilers)


Update (2015/02/27): While optimizing some geophysics code (Q2, 2013) to run on Sandy Bridge-E Xeons, I had an opportunity to compare the performance of ICC 11.1 against GCC 4.8.0, and GCC was now generating faster code than ICC. The code made used of AVX intrinsics and did use 8-way vectorized instructions (nieither compiler autovectorized the code properly due to certain data layout requirements). In addition, GCC s LTO implementation (with the IR core embedded in the .o files) was much easier to manage than that in ICC. GCC with LTO was running roughly 3 times faster than ICC without LTO. I m not able to find the numbers right now for GCC without LTO, but I recall it was still faster than ICC. It s by no means a general statement on ICC s performance, but the results were sufficient for us to go ahead with GCC 4.8.*.

Looking forward to GCC 5.0 (http://www.phoronix.com/scan.php?page=article&item=gcc-50-broadwell)!

We use the Intel compiler on our product (DB2), on Linux and Windows IA32/AMD64, and on OS X (i.e. all our Intel platform ports except SunAMD).

I don t know the numbers, but the performance is good enough that we:

  • pay for the compiler which I m told is very expensive.
  • live with the 2x times slower build times (primarily due to the time it spends acquiring licenses before it allows itself to run).

PHP - Compilation from source, with ICC rather than GCC, should result in a 10 % to 20 % speed improvment - http://www.papelipe.no/tags/ez_publish/benchmark_of_intel_compiled_icc_apache_php_and_apc

MySQL - Compilation from source, with ICC rather than GCC, should result in a 25 % to 50 % speed improvment - http://www.mysqlperformanceblog.com/files/presentations/LinuxWorld2005-Intel.pdf

I used to work on a fairly large signal processing system which ran on a large cluster. We used to reckon for heavy maths crunching, the Intel compiler gave us about 10% less CPU load than GCC. That s very unscientific but it was our experience (that was about 18 months ago).

What would have been interesting is if we d been able to use Intel s math libraries as well which use their chipset more efficiently.

I used UnixBench (v. 5.1.3) on an openSUSE 12.2 (kernel 3.4.33-2.24-default x86_64), and compiled it first with GCC, and then with Intel s compiler.

With 1 parallel copy, UnixBench compiled with Intel s is about 20% faster than the version compiled with GCC. However this hides huge differences. Dhrystone is about 25% slower with Intel compiler, while Whetstone runs 2x faster.

With 4 copies of UnixBench running in parallel, the improvement of Intel compiler over GCC is only 7%. Again Intel is much better at Whetstone (> 200%), and slower at Dhrystone (about 20%).

Many optimizations which the Intel compiler performs routinely require specific source syntax and use of -O3 -ffast-math for gcc. Unfortunately, the -funsafe-math-optimizations component of -ffast-math -O3 -march=native has turned out to be incompatible with -fopenmp, so I must split my source files into groups named with the different options in Makefile. Today I ran into a failure where a g++ build using -O3 -ffast-math -fopenmp -march=native was able to write to screen but not redirect to a file. One of the more egregious differences in my opinion is the optimization by icpc only of std::max and min where gcc/g++ want the fmax|min[f] with -ffast-math to change their meaning away from standard.





相关问题
Undefined reference

I m getting this linker error. I know a way around it, but it s bugging me because another part of the project s linking fine and it s designed almost identically. First, I have namespace LCD. Then I ...

C++ Equivalent of Tidy

Is there an equivalent to tidy for HTML code for C++? I have searched on the internet, but I find nothing but C++ wrappers for tidy, etc... I think the keyword tidy is what has me hung up. I am ...

Template Classes in C++ ... a required skill set?

I m new to C++ and am wondering how much time I should invest in learning how to implement template classes. Are they widely used in industry, or is this something I should move through quickly?

Print possible strings created from a Number

Given a 10 digit Telephone Number, we have to print all possible strings created from that. The mapping of the numbers is the one as exactly on a phone s keypad. i.e. for 1,0-> No Letter for 2->...

typedef ing STL wstring

Why is it when i do the following i get errors when relating to with wchar_t? namespace Foo { typedef std::wstring String; } Now i declare all my strings as Foo::String through out the program, ...

C# Marshal / Pinvoke CBitmap?

I cannot figure out how to marshal a C++ CBitmap to a C# Bitmap or Image class. My import looks like this: [DllImport(@"test.dll", CharSet = CharSet.Unicode)] public static extern IntPtr ...

Window iconification status via Xlib

Is it possible to check with the means of pure X11/Xlib only whether the given window is iconified/minimized, and, if it is, how?

热门标签