我有使用在线会议的功能:
vec8w x86_sse_ldvwu(const vec8w* m) {
vec8w rd;
asm("movdqu %[m],%[rd]" : [rd] "=x" (rd) : [m] "xm" (*m));
return rd;
}
汇编成以下法典:
sub $0x1c,%esp
mov 0x24(%esp),%eax
movdqa (%eax),%xmm0
movdqu %xmm0,%xmm0
movdqa %xmm0,(%esp)
movdqa (%esp),%xmm0
add $0x1c,%esp
ret
The code isn t terribly efficient, but that isn t my concern. As you can see the inline assembler inserts a movdqa instruction copying from the address in %eax to xmm0. The problem is that the pointer vec8w* m is not 128 bytes aligned, so I get a seg fault when movdqa is being executed. My question is whether there is a way to instruct the inline assembler to use movdqu instead of movdqa (that it uses by default)? I tried to look for a workaround using SSE intrinsic functions for g++, but somehow I cannot find movdqu in xmmintrin.h file (where it should be declared, I suppose). Unfortunately, I cannot modify the code so that the function is always called for an aligned argument m.