Question 1

......论坛和博客员额以及每个人似乎都同意,使用近地天体比使用VFP更好,或至少将近地天体N混为一谈(例如,使用斜体在SIMD中实施某些藻类),而VFP并不是这样一个好的想法。

The NEON register bank consists of 32 64-bit registers. If both Advanced SIMD and VFPv3 are implemented, they share this register bank. In this case, VFPv3 is implemented in the VFPv3-D32 form that supports 32 double-precision floating-point registers. This integration simplifies implementing context switching support, because the same routines that save and restore VFP context also save and restore NEON context.

近地天体股可将同一登记册库视为:

sixteen 128-bit quadword registers, Q0-Q15

thirty-two 64-bit doubleword registers, D0-D31.

The NEON D0-D31 registers are the same as the VFPv3 D0-D31 registers and each of the Q0-Q15 registers map onto a pair of D registers. Figure 1.3 shows the different views of the shared NEON and VFP register bank. All of these views are accessible at any time. Software does not have to explicitly switch between them, because the instruction used determines the appropriate view.

登记册没有竞争;相反,它们作为注册银行的观点是共存的。没有任何办法拆解近地天体和甚小口径终端仪器。

参照这一Im,使用以下汇编旗帜:

-O3 -mcpu=cortex-a9 -mfpu=neon -mfloat-abi=softfp
-O3 -mcpu=cortex-a9 -mfpu=vfpv3 -mfloat-abi=softfp

在此,我做了些什么;你的微笑可能有所不同。其来源于从平台和汇编者收集的信息。

。平台使用硬浮标,可以加快程序电话。如果有疑问,则使用<代码>软fp,因为它符合硬浮动。

BeagleBone Black:

$ gcc -v 2>&1 | grep Target          
Target: arm-linux-gnueabihf

$ cat /proc/cpuinfo
model name  : ARMv7 Processor rev 2 (v7l)
Features    : half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32 
...

因此,BeagleBone使用:

-march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=hard

<><>CubieTruck v5:>

$ gcc -v 2>&1 | grep Target 
Target: arm-linux-gnueabihf

$ cat /proc/cpuinfo
Processor   : ARMv7 Processor rev 5 (v7l)
Features    : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpv4

因此,CubieTruck使用:

-march=armv7-a -mtune=cortex-a7 -mfpu=neon-vfpv4 -mfloat-abi=hard

<><>Banana Pi Pro:>

$ gcc -v 2>&1 | grep Target 
Target: arm-linux-gnueabihf

$ cat /proc/cpuinfo
Processor   : ARMv7 Processor rev 4 (v7l)
Features    : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt

因此,香蕉使用:

-march=armv7-a -mtune=cortex-a7 -mfpu=neon-vfpv4 -mfloat-abi=hard

<><><>> > Pi 3 :

RPI3因其ARMv8而独一无二,但管理着一个32位专业顾问。这意味着它实际上拥有32倍的ARM或。 Aarch32。对ARM诉Aarch32案略为32,但这将显示Aarch32旗帜。

而且,RPI3使用了AS-53 SoC,并且有近地天体N和选择性的CRC32指令,但没有选择性的密码延伸。

$ gcc -v 2>&1 | grep Target 
Target: arm-linux-gnueabihf

$ cat /proc/cpuinfo 
model name  : ARMv7 Processor rev 4 (v7l)
Features    : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
...

So the Raspberry Pi can use:

-march=armv8-a+crc -mtune=cortex-a53 -mfpu=neon-fp-armv8 -mfloat-abi=hard

或者可以使用(我不知道如何使用<代码>-mtune):

-march=armv7-a -mfpu=neon-vfpv4 -mfloat-abi=hard

<>><>ODROID C2:>

ODROID C2使用Amlogic A53 SoC,但使用64比方位顾问。 ODROID C2, 它有近地天体和选择性的CRC32指令,但没有选择性的密码延伸(类似于RPI3)。

$ gcc -v 2>&1 | grep Target 
Target: aarch64-linux-gnu

$ cat /proc/cpuinfo 
Features    : fp asimd evtstrm crc32

因此,ODROID使用:

-march=armv8-a+crc -mtune=cortex-a53

在以上对应措施中,我通过检查数据表了解了ARM处理器(如Corx A9或A53)。根据关于,Unix and Franklin Stack Exchange,其中从/proc/cpuinfo中删除了西半球产出:

CPU: Part number. 0xd03 indicated Cortex-A53 processor.

因此,我们可以研究数据库的价值。我不知道它是否存在或位于何处。

Answer

......论坛和博客员额以及每个人似乎都同意,使用近地天体比使用VFP更好,或至少将近地天体N混为一谈(例如,使用斜体在SIMD中实施某些藻类),而VFP并不是这样一个好的想法。

我不相信这一点是正确的。据ARM称,。引入近地天体开发第1条登记:

The NEON register bank consists of 32 64-bit registers. If both Advanced SIMD and VFPv3 are implemented, they share this register bank. In this case, VFPv3 is implemented in the VFPv3-D32 form that supports 32 double-precision floating-point registers. This integration simplifies implementing context switching support, because the same routines that save and restore VFP context also save and restore NEON context.

近地天体股可将同一登记册库视为:

sixteen 128-bit quadword registers, Q0-Q15

thirty-two 64-bit doubleword registers, D0-D31.

The NEON D0-D31 registers are the same as the VFPv3 D0-D31 registers and each of the Q0-Q15 registers map onto a pair of D registers. Figure 1.3 shows the different views of the shared NEON and VFP register bank. All of these views are accessible at any time. Software does not have to explicitly switch between them, because the instruction used determines the appropriate view.

登记册没有竞争;相反,它们作为注册银行的观点是共存的。没有任何办法拆解近地天体和甚小口径终端仪器。

参照这一Im,使用以下汇编旗帜:

-O3 -mcpu=cortex-a9 -mfpu=neon -mfloat-abi=softfp
-O3 -mcpu=cortex-a9 -mfpu=vfpv3 -mfloat-abi=softfp

在此,我做了些什么;你的微笑可能有所不同。其来源于从平台和汇编者收集的信息。

。平台使用硬浮标,可以加快程序电话。如果有疑问,则使用<代码>软fp,因为它符合硬浮动。

BeagleBone Black:

$ gcc -v 2>&1 | grep Target          
Target: arm-linux-gnueabihf

$ cat /proc/cpuinfo
model name  : ARMv7 Processor rev 2 (v7l)
Features    : half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32 
...

因此,BeagleBone使用:

-march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=hard

<><>CubieTruck v5:>

$ gcc -v 2>&1 | grep Target 
Target: arm-linux-gnueabihf

$ cat /proc/cpuinfo
Processor   : ARMv7 Processor rev 5 (v7l)
Features    : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpv4

因此,CubieTruck使用:

-march=armv7-a -mtune=cortex-a7 -mfpu=neon-vfpv4 -mfloat-abi=hard

<><>Banana Pi Pro:>

$ gcc -v 2>&1 | grep Target 
Target: arm-linux-gnueabihf

$ cat /proc/cpuinfo
Processor   : ARMv7 Processor rev 4 (v7l)
Features    : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt

因此,香蕉使用:

-march=armv7-a -mtune=cortex-a7 -mfpu=neon-vfpv4 -mfloat-abi=hard

<><><>> > Pi 3 :

RPI3因其ARMv8而独一无二,但管理着一个32位专业顾问。这意味着它实际上拥有32倍的ARM或。 Aarch32。对ARM诉Aarch32案略为32,但这将显示Aarch32旗帜。

而且,RPI3使用了AS-53 SoC,并且有近地天体N和选择性的CRC32指令,但没有选择性的密码延伸。

$ gcc -v 2>&1 | grep Target 
Target: arm-linux-gnueabihf

$ cat /proc/cpuinfo 
model name  : ARMv7 Processor rev 4 (v7l)
Features    : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
...

So the Raspberry Pi can use:

-march=armv8-a+crc -mtune=cortex-a53 -mfpu=neon-fp-armv8 -mfloat-abi=hard

或者可以使用(我不知道如何使用<代码>-mtune):

-march=armv7-a -mfpu=neon-vfpv4 -mfloat-abi=hard

<>><>ODROID C2:>

ODROID C2使用Amlogic A53 SoC,但使用64比方位顾问。 ODROID C2, 它有近地天体和选择性的CRC32指令,但没有选择性的密码延伸(类似于RPI3)。

$ gcc -v 2>&1 | grep Target 
Target: aarch64-linux-gnu

$ cat /proc/cpuinfo 
Features    : fp asimd evtstrm crc32

因此,ODROID使用:

-march=armv8-a+crc -mtune=cortex-a53

在以上对应措施中,我通过检查数据表了解了ARM处理器(如Corx A9或A53)。根据关于,Unix and Franklin Stack Exchange,其中从/proc/cpuinfo中删除了西半球产出:

CPU: Part number. 0xd03 indicated Cortex-A53 processor.

因此,我们可以研究数据库的价值。我不知道它是否存在或位于何处。

Question 2

我认为,这个问题应该分为几个部分,增加一些编码实例,详细说明目标平台和使用的工具链。

But to cover one part of confusion: The recommendation to "use NEON as the FPU" sounds like a misunderstanding. NEON is a SIMD engine, the VFP is an FPU. You can use NEON for single-precision floating-point operations on up to 4 single-precision values in parallel, which (when possible) is good for performance.

www.un.org/Depts/DGACM/index_spanish.htm 可视之为<代码>-mfpu=neon-vfpv3的短处。

见rel=“nofollow”http://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html。更多信息。

Answer

我认为,这个问题应该分为几个部分,增加一些编码实例,详细说明目标平台和使用的工具链。

But to cover one part of confusion: The recommendation to "use NEON as the FPU" sounds like a misunderstanding. NEON is a SIMD engine, the VFP is an FPU. You can use NEON for single-precision floating-point operations on up to 4 single-precision values in parallel, which (when possible) is good for performance.

www.un.org/Depts/DGACM/index_spanish.htm 可视之为<代码>-mfpu=neon-vfpv3的短处。

见rel=“nofollow”http://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html。更多信息。

Question 3

I d 远离联邦警察局。就像Thmub模式一样: 它意在为汇编者。对他们来说,优化是没有意义的。

这可能是稳妥的谨慎,但我确实看不出在近地天体中的任何一点。它比帮助更麻烦——如果有的话。

仅投资两天或三天参加基本生活管理大会:你只需要学习很少关于休息控制/终止的指示。

然后,你可以开始撰写本土的近地天体代码,而不必担心汇编者会做一些令人窒息的错误/警告。

学习近地国家指示的难度低于所有固有的宏观要求。最重要的是,结果要好得多。

Fully optimized NEON native codes usually run more than twice as fast than well-written intrinsics counterparts.

仅与以下链接的禁止化学武器组织版本和地雷相比较,你就知道我指的是什么。

Optimizing RGBA88 to RGB565 2007, withtekN

Answer

I d 远离联邦警察局。就像Thmub模式一样: 它意在为汇编者。对他们来说,优化是没有意义的。

这可能是稳妥的谨慎,但我确实看不出在近地天体中的任何一点。它比帮助更麻烦——如果有的话。

仅投资两天或三天参加基本生活管理大会:你只需要学习很少关于休息控制/终止的指示。

然后,你可以开始撰写本土的近地天体代码,而不必担心汇编者会做一些令人窒息的错误/警告。

学习近地国家指示的难度低于所有固有的宏观要求。最重要的是,结果要好得多。

Fully optimized NEON native codes usually run more than twice as fast than well-written intrinsics counterparts.

仅与以下链接的禁止化学武器组织版本和地雷相比较,你就知道我指的是什么。

Optimizing RGBA88 to RGB565 2007, withtekN

友情链接