在我介绍我正在使用的守则时,一个特别热点是:
for(int loc = start; loc<end; ++loc)
y[loc]+=a[offset+loc]*x[loc+d];
阵列、a、x没有重叠。 在我看来,像这种循环很容易成为病媒,然而,当我用“O3-ftree-vectorize-ftree-vectorizer-verbose=1”的“g++”汇编时,我没有发现这一特殊循环是病媒的。 然而,在上述法典之前就发生了一起诉讼:
for(int i=0; i<m; ++i)
y[i]=0;
does get vectorized according to the output. Any thoughts on why the first loop is not vectorized, or how I might be able to fix this? (I am not all that educated on the concept of vectorization, so I am likely missing something quite obvious)
As per Oli s suggestion, turning up the verbosity yields the following notes (while I am usually good at reading compiler warnings/errors/output, I have no idea what this means):
./include/mv_ops.h:89: note: dependence distance = 0.
./include/mv_ops.h:89: note: accesses have the same alignment.
./include/mv_ops.h:89: note: dependence distance modulo vf == 0 between *D.50620_89 and *D.50620_89
./include/mv_ops.h:89: note: not vectorized: can t determine dependence between *D.50623_98 and *D.50620_89