English 中文(简体)
Cortex A9/2007/2N诉VFP使用混淆
原标题:Cortex A9 NEON vs VFP usage confusion

I m 试图为Corx A9 ARM处理器(更具体地说,OMAP4)和I m建造一个图书馆,在浮动点操作和SIMD中,在什么时候使用近地天体N诉VFP方面,这种混淆不清。 请注意,我知道2个硬件共同处理单位之间的区别(如以下简称:here on SO),我仅对其适当使用有一些误解。

参照这一Im,使用以下汇编旗帜:

GCC
-O3 -mcpu=cortex-a9 -mfpu=neon -mfloat-abi=softfp
-O3 -mcpu=cortex-a9 -mfpu=vfpv3 -mfloat-abi=softfp
ARMCC
--cpu=Cortex-A9 --apcs=/softfp
--cpu=Cortex-A9 --fpu=VFPv3 --apcs=/softfp

I ve read through the ARM documentation, a lot of wiki(like this one), forum and blog posts and everybody seems to agree that using NEON is better than using VFP or at least mixing NEON(e.g. using the instrinsics to implement some algos in SIMD) and VFP is not such a good idea; I m not 100% sure yet if this applies in the context of the entire applicationlibrary or just to specific places(functions) in code.

So I m using neon as the FPU for my application as I also want to use the intrinsics. As a result I m in a little bit of trouble and my confusion on how to best use these features(NEON vs VFP) on the Cortex A9 just deepens further instead of clearing up. I have some code that does benchmarking for my app and uses some custom made timer classes in which calculations are based on double precision floating point. Using NEON as the FPU gives completely inappropriate results(trying to print those values results in printing mostly inf and NaN; the same code works without a hitch when built for x86). So I changed my calculations to use single precision floating point as is documented that NEON does not handle double precision floating point. My benchmarks still don t give the proper results(and what s worst is that now it does not work anymore on x86; I think it s because of the lost in precision but I m not sure). So I m almost completely lost: on one hand I want to use NEON for the SIMD capabilities and using it as the FPU does not provide the proper results, on the other hand mixing it with the VFP does not seem a very good idea. Any advice in this area will be greatly appreciated !!

我在上述文章中看到,关于应如何在近地天体中优化浮点的概述:

“......

  • Only use single precision floating point
  • Use NEON intrinsics / ASM when ever you find a bottlenecking FP function. You can do better than the compiler.
  • Minimize Conditional Branches
  • Enable RunFast mode

For softfp:

  • Inline floating point code (unless its very large)
  • Pass FP arguments via pointers instead of by value and do integer work in between function calls.

“......

I cannot use hard for the float ABI as I cannot link with the libraries I have available. Most of the reccomendations make sense to me(except the “......runfast mode“...... which I don t understand exactly what s supposed to do and the fact that at this moment in time I could do better than the compiler) but I keep getting inconsistent results and I m not sure of anything right now.

是否有任何人对如何适当利用浮点和为Corx A9/A8提供的近地天体N以及我应当使用哪些编造旗作了说明?

问题回答

......论坛和博客员额以及每个人似乎都同意,使用近地天体比使用VFP更好,或至少将近地天体N混为一谈(例如,使用斜体在SIMD中实施某些藻类),而VFP并不是这样一个好的想法。

我不相信这一点是正确的。 据ARM称,。 引入近地天体开发第1条 登记:

The NEON register bank consists of 32 64-bit registers. If both Advanced SIMD and VFPv3 are implemented, they share this register bank. In this case, VFPv3 is implemented in the VFPv3-D32 form that supports 32 double-precision floating-point registers. This integration simplifies implementing context switching support, because the same routines that save and restore VFP context also save and restore NEON context.

近地天体股可将同一登记册库视为:

  • sixteen 128-bit quadword registers, Q0-Q15
  • thirty-two 64-bit doubleword registers, D0-D31.

The NEON D0-D31 registers are the same as the VFPv3 D0-D31 registers and each of the Q0-Q15 registers map onto a pair of D registers. Figure 1.3 shows the different views of the shared NEON and VFP register bank. All of these views are accessible at any time. Software does not have to explicitly switch between them, because the instruction used determines the appropriate view.

登记册没有竞争;相反,它们作为注册银行的观点是共存的。 没有任何办法拆解近地天体和甚小口径终端仪器。


参照这一Im,使用以下汇编旗帜:

-O3 -mcpu=cortex-a9 -mfpu=neon -mfloat-abi=softfp
-O3 -mcpu=cortex-a9 -mfpu=vfpv3 -mfloat-abi=softfp

在此,我做了些什么;你的微笑可能有所不同。 其来源于从平台和汇编者收集的信息。

。 平台使用硬浮标,可以加快程序电话。 如果有疑问,则使用<代码>软fp,因为它符合硬浮动。

BeagleBone Black:

$ gcc -v 2>&1 | grep Target          
Target: arm-linux-gnueabihf

$ cat /proc/cpuinfo
model name  : ARMv7 Processor rev 2 (v7l)
Features    : half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32 
...

因此,BeagleBone使用:

-march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=hard

<><>CubieTruck v5:>

$ gcc -v 2>&1 | grep Target 
Target: arm-linux-gnueabihf

$ cat /proc/cpuinfo
Processor   : ARMv7 Processor rev 5 (v7l)
Features    : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpv4 

因此,CubieTruck使用:

-march=armv7-a -mtune=cortex-a7 -mfpu=neon-vfpv4 -mfloat-abi=hard

<><>Banana Pi Pro:>

$ gcc -v 2>&1 | grep Target 
Target: arm-linux-gnueabihf

$ cat /proc/cpuinfo
Processor   : ARMv7 Processor rev 4 (v7l)
Features    : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt

因此,香蕉 使用:

-march=armv7-a -mtune=cortex-a7 -mfpu=neon-vfpv4 -mfloat-abi=hard

<><><>> > Pi 3 :

RPI3因其ARMv8而独一无二,但管理着一个32位专业顾问。 这意味着它实际上拥有32倍的ARM。 Aarch32。 对ARM诉Aarch32案略为32,但这将显示Aarch32旗帜。

而且,RPI3使用了AS-53 SoC,并且有近地天体N和选择性的CRC32指令,但没有选择性的密码延伸。

$ gcc -v 2>&1 | grep Target 
Target: arm-linux-gnueabihf

$ cat /proc/cpuinfo 
model name  : ARMv7 Processor rev 4 (v7l)
Features    : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
...

So the Raspberry Pi can use:

-march=armv8-a+crc -mtune=cortex-a53 -mfpu=neon-fp-armv8 -mfloat-abi=hard

或者可以使用(我不知道如何使用<代码>-mtune):

-march=armv7-a -mfpu=neon-vfpv4 -mfloat-abi=hard 

<>><>ODROID C2:>

ODROID C2使用Amlogic A53 SoC,但使用64比方位顾问。 ODROID C2, 它有近地天体和选择性的CRC32指令,但没有选择性的密码延伸(类似于RPI3)。

$ gcc -v 2>&1 | grep Target 
Target: aarch64-linux-gnu

$ cat /proc/cpuinfo 
Features    : fp asimd evtstrm crc32

因此,ODROID使用:

-march=armv8-a+crc -mtune=cortex-a53

在以上对应措施中,我通过检查数据表了解了ARM处理器(如Corx A9或A53)。 根据关于,Unix and Franklin Stack Exchange,其中从/proc/cpuinfo中删除了西半球产出:

CPU: Part number. 0xd03 indicated Cortex-A53 processor.

因此,我们可以研究数据库的价值。 我不知道它是否存在或位于何处。

我认为,这个问题应该分为几个部分,增加一些编码实例,详细说明目标平台和使用的工具链。

But to cover one part of confusion: The recommendation to "use NEON as the FPU" sounds like a misunderstanding. NEON is a SIMD engine, the VFP is an FPU. You can use NEON for single-precision floating-point operations on up to 4 single-precision values in parallel, which (when possible) is good for performance.

www.un.org/Depts/DGACM/index_spanish.htm 可视之为<代码>-mfpu=neon-vfpv3的短处。

rel=“nofollow”http://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html。 更多信息。

I d 远离联邦警察局。 就像Thmub模式一样: 它意在为汇编者。 对他们来说,优化是没有意义的。

这可能是稳妥的谨慎,但我确实看不出在近地天体中的任何一点。 它比帮助更麻烦——如果有的话。

仅投资两天或三天参加基本生活管理大会:你只需要学习很少关于休息控制/终止的指示。

然后,你可以开始撰写本土的近地天体代码,而不必担心汇编者会做一些令人窒息的错误/警告。

学习近地国家指示的难度低于所有固有的宏观要求。 最重要的是,结果要好得多。

Fully optimized NEON native codes usually run more than twice as fast than well-written intrinsics counterparts.

仅与以下链接的禁止化学武器组织版本和地雷相比较,你就知道我指的是什么。

Optimizing RGBA88 to RGB565 2007, withtekN

目 录





相关问题
Undefined reference

I m getting this linker error. I know a way around it, but it s bugging me because another part of the project s linking fine and it s designed almost identically. First, I have namespace LCD. Then I ...

C++ Equivalent of Tidy

Is there an equivalent to tidy for HTML code for C++? I have searched on the internet, but I find nothing but C++ wrappers for tidy, etc... I think the keyword tidy is what has me hung up. I am ...

Template Classes in C++ ... a required skill set?

I m new to C++ and am wondering how much time I should invest in learning how to implement template classes. Are they widely used in industry, or is this something I should move through quickly?

Print possible strings created from a Number

Given a 10 digit Telephone Number, we have to print all possible strings created from that. The mapping of the numbers is the one as exactly on a phone s keypad. i.e. for 1,0-> No Letter for 2->...

typedef ing STL wstring

Why is it when i do the following i get errors when relating to with wchar_t? namespace Foo { typedef std::wstring String; } Now i declare all my strings as Foo::String through out the program, ...

C# Marshal / Pinvoke CBitmap?

I cannot figure out how to marshal a C++ CBitmap to a C# Bitmap or Image class. My import looks like this: [DllImport(@"test.dll", CharSet = CharSet.Unicode)] public static extern IntPtr ...

Window iconification status via Xlib

Is it possible to check with the means of pure X11/Xlib only whether the given window is iconified/minimized, and, if it is, how?

热门标签