English 中文(简体)
iPhone ARMv6 VFP asm latency, throughput and hazards
原标题:

in this document: http://infocenter.arm.com/help/topic/com.arm.doc.ddi0301g/DDI0301G_arm1176jzfs_r0p7_trm.pdf

on page 21-25 (pdf page 875) the througput and latency timings are given for the assembly instructions of the VFP unit.

Are those numbers independant of vectorsize?

1: let s take FMULS which has throughput of 1 and latency of 8. does it mean that i can start in each cycle a new FMULS operation if i don t use a register which is not currently calculated by a previous function? for example:

FMULS s8, s16, s20
FMULS s12, s21, s25

will those exectue right after each other?

2: what happens if I have two FMULS functions after each other where one argument depends upon the previous computation

FMULS s8, s16, s20
FMULS s12, s21, s8

will the VFP wait for 8 cycles before starting to process the second instruction?

3: what if we are in vectormode with 4 elements and on the second FMULS instruction all inputregisters but one are available. what will happen?

4: sqrt and division: will a sqrt or division operation prevent any subsequent operation from being started for 19 cycles?

thanks!

问题回答

Your questions are all answered in the document that you linked. You should read it carefully.

Are those numbers independent of vectorsize?

No. See, for example, Table 21-15 in the document you linked. Note the latency of the short vector FADDS.

does it mean that I can start a new FMULS operation every cycle if it doesn t depend on an earlier result that isn t available yet?

Yes, that s the definition of throughput.

what happens if I have two FMULS functions after each other where one argument depends upon the previous computation

Execution will stall until the result of the first FMULS is available. See 21.6 "Operation of the scoreboards" for more detail.

what if we are in vectormode with 4 elements and on the second FMULS instruction all inputregisters but one are available. what will happen?

It will stall. Same reference.

sqrt and division: will a sqrt or division operation prevent any subsequent operation from being started for 19 cycles?

No. See section 21.10 "Parallel Execution". An example is given in Table 21-15, in which a non-dependent FADDS executes immediately following FDIVS.

Note that it can be a bit of a challenge (though not impossible) to write short-vector VFP code that performs substantially faster than scalar code for many types of computation. Even if you learn how to do it, it will be of questionable value since the NEON unit seems to be the new model for vector computation on ARM. You may be better served in the long run by ignoring the short-vector operation for now and focusing on learning NEON for the future.





相关问题
Code sign Error

I have created a new iPhone application.I have two mach machines. I have created the certificate for running application in iPhone in one mac. Can I use the other mac for running the application in ...

ABPersonViewController Usage for displaying contact

Created a View based Project and added a contact to the AddressBook using ABAddressBookRef,ABRecordRef now i wanted to display the added contact ABPersonViewController is the method but how to use in ...

将音频Clips从Peter改为服务器

我不禁要问,那里是否有任何实例表明从Peit向服务器发送音响。 I m不关心电话或SIP风格的解决办法,只是一个简单的袖珍流程......

• 如何将搜查线重新定位?

我正试图把图像放在搜索条左边。 但是,问题始于这里,搜索条线不能重新布署。

热门标签