English 中文(简体)
3D矢量在网络评估中的重要产品
原标题:Dot product of 3D vectors in webassembly
问题回答

The dot product in wasm must be interpreted as i32x4 being the output, and i16x8 being the input. This indeed corresponds to Intel pmaddwd or pair-wise multiply add words into double words. The intrinsic is also implementation specific, as -32768 **2 * 2 overflows int32_t.

To compute dot product of two 3-element vectors, one must just unroll it as a[0]*b[0]+a[1]*b[1].... Allocating a full v128 for just those three values might help the JIT in optimising.

如@harold评论和@Aki提到,i32x4.dot_i16x8_ Webassembly号指令似乎与pmaddwd相对。

我确实认为在

www.un.org/Depts/DGACM/index_spanish.htm Integer dot Products

i32x4.dot_i16x8_s(a: v128, b: v128) -> v128

两个矢志中道道合器在16轨道上签字,并加上整个32倍结果的相邻。

虽然不能将该指示直接用作“dot产品操作者,但正如我所希望的那样,这仍然有助于这种执行。

<代码>pmaddwd的描述 是:

1. 通过源歌剧(第二次歌剧)的相应签名词,将个人签名的目的地歌剧(第一演剧)的词句多出,产生临时签名的双词结果。 随后,在目的地歌剧中总结并储存了相邻的双词结果。

And this illustration is helpful: pmaddwd

自2006年以来 我想使用3个元素的病媒,我可以把矢量输入到0s上,然后从厕所中提取资金,最后再补充。

Dot product implementation for 3D vectors

考虑到wikipedia article :

[1 3 -5] dot [4 -2 -1] = 3

我履行了一项职能,即计算这些病媒的dot Products:

(module
  (func (export "calc_dot") (result i32)
    i32.const 1
    i32.const 3
    i32.const -5

    i32.const 4
    i32.const -2
    i32.const -1

    call $dot3
    return
  )

  (func $dot3 (param i32) (param i32) (param i32) (param i32) (param i32) (param i32) (result i32)
    (local v128)

    ;; create vector from first 3 params
    (i16x8.splat (i32.const 0))
    (i16x8.replace_lane 0 (local.get 0))
    (i16x8.replace_lane 1 (local.get 1))
    (i16x8.replace_lane 2 (local.get 2))

    ;; create vector from last 3 params
    (i16x8.splat (i32.const 0))
    (i16x8.replace_lane 0 (local.get 3))
    (i16x8.replace_lane 1 (local.get 4))
    (i16x8.replace_lane 2 (local.get 5))

    ;; integer dot product
    (local.set 6 (i32x4.dot_i16x8_s))

    (i32x4.extract_lane 0 (local.get 6))
    (i32x4.extract_lane 1 (local.get 6))
    i32.add

    return
  )
)




相关问题
How do I reorder vector data using ARM Neon intrinsics?

This is specifically related to ARM Neon SIMD coding. I am using ARM Neon instrinsics for certain module in a video decoder. I have a vectorized data as follows: There are four 32 bit elements in a ...

New instruction sets in CPU

Every new generation of CPU introduces some sets of new instructions, i.e. MMX, 3DNOW, SSE and so on. I ve got few general questions about them: If some program uses for example SSE instruction can ...

transpose for 8 registers of 16-bit elements on SSE2/SSSE3

(I m a newbie to SSE/asm, apologies if this is obvious or redundant) Is there a better way to transpose 8 SSE registers containing 16-bit values than performing 24 unpck[lh]ps and 8/16+ shuffles and ...

Storing two x86 32 bit registers into 128 bit xmm register

Is there any faster method to store two x86 32 bit registers in one 128 bit xmm register? movd xmm0, edx movd xmm1, eax pshufd xmm0, xmm0, $1 por xmm0, xmm1 So if EAX is 0x12345678 and EDX is ...

Common SIMD techniques

Where can I find information about common SIMD tricks? I have an instruction set and know, how to write non-tricky SIMD code, but I know, SIMD now is much more powerful. It can hold complex ...

SSE2: How to reduce a _m128 to a word

What s the best way ( sse2 ) to reduce a _m128 ( 4 words a b c d) to one word? I want the low part of each _m128 components: int result = ( _m128.a & 0x000000ff ) << 24 | ( _m128.b ...

热门标签