English 中文(简体)
How do I reorder vector data using ARM Neon intrinsics?
原标题:

This is specifically related to ARM Neon SIMD coding. I am using ARM Neon instrinsics for certain module in a video decoder. I have a vectorized data as follows:

There are four 32 bit elements in a Neon register - say, Q0 - which is of size 128 bit.

3B 3A 1B 1A

There are another four, 32 bit elements in other Neon register say Q1 which is of size 128 bit.

3D 3C 1D 1C

I want the final data to be in order as shown below:

1D 1C 1B 1A
3D 3C 3B 3A

What Neon instrinsics can achieve the desired data order?

最佳回答

how about something like this:

  int32x4_t q0, q1;

  /* split into 64 bit vectors */
  int32x2_t q0_hi = vget_high_s32 (q0);
  int32x2_t q1_hi = vget_high_s32 (q1);
  int32x2_t q0_lo = vget_low_s32 (q0);
  int32x2_t q1_lo = vget_low_s32 (q1);

  /* recombine into 128 bit vectors */
  q0 = vcombine_s32 (q0_lo, q1_lo);
  q1 = vcombine_s32 (q0_hi, q1_hi);

In theory this should compile to just two move instructions because the vget_high and vget_low just reinterpret the 128 bit Q registers as two 64 bit D registers. vcombine otoh just compiles to one or two moves (depends on register allocation).

Oh - and the order of the integers in the output could be exactly the wrong way around. If so just swap the arguments to vcombine_s32.

问题回答

Remember each q register is made up of two d registers, for instance the low part of q0 is d0 and the high part d1. So in fact, this operation is just swapping d0 and d3 (or d1 and d2, it is not entirely clear from your data presentation). There is even a swap instruction to do it in one instruction!

Disclaimer: I don t know Neon intrinsics (I directly code in assembly), though I d be surprised if this couldn t be done using intrinsics.

It looks like you should be able to use the VTRN instruction (e.g. vtrnq_u32) for this.

Pierre is right.

vswp d0, d3

that will do.

@Pierre : I read the post about NEON on your blog several months ago. I was pleasantly surprised that there was someone like me - writing hand optimized assembly codes, both ARM and NEON. Nice to see you.





相关问题
arm7 usb programming

we are developing a sendor device, with a arm7(current: LPC2368) . this device samples a mv signal,A/D, and need to send this signal data to the PC.(continusly) at the same time, PC need send ...

Fast sine/cosine for ARMv7+NEON: looking for testers…

Could somebody with access to an iPhone 3GS or a Pandora please test the following assembly routine I just wrote? It is supposed to compute sines and cosines really really fast on the NEON vector FPU....

Steps to read data from ARM microcontroller port

I am having trouble reading serial data from ARM LPC2378 microcontroller. Will I have to use UART or any GPIO port can be used?? is ayone having c code for it??

Linux user-space ELF loader

I need to do a rather unusual thing: manually execute an elf executable. I.e. load all sections into right places, query main() and call it (and cleanup then). Executable will be statically linked, so ...

Sample Android BSP(Source) for ARM

I am looking for a ARM processor version of Android BSP to port it for one of my experimental boards. Where can I download this?

热门标签