I am doing some numerical optimization on a scientific application. One thing I noticed is that GCC will optimize the call pow(a,2)
by compiling it into a*a
, but the call pow(a,6)
is not optimized and will actually call the library function pow
, which greatly slows down the performance. (In contrast, Intel C++ Compiler, executable icc
, will eliminate the library call for pow(a,6)
.)
What I am curious about is that when I replaced pow(a,6)
with a*a*a*a*a*a
using GCC 4.5.1 and options "-O3 -lm -funroll-loops -msse4
", it uses 5 mulsd
instructions:
movapd %xmm14, %xmm13
mulsd %xmm14, %xmm13
mulsd %xmm14, %xmm13
mulsd %xmm14, %xmm13
mulsd %xmm14, %xmm13
mulsd %xmm14, %xmm13
如果我写了<代码>(a*a*a)*(a*a*a),则该编码将产生。
movapd %xmm14, %xmm13
mulsd %xmm14, %xmm13
mulsd %xmm14, %xmm13
mulsd %xmm13, %xmm13
将乘数指示数减至3:ic
也有类似行为。
Why do compilers not recognize this optimization trick?