English 中文(简体)
如何区分Pock和CUBIN W.r.t.NVCC汇编者?
原标题:What s the difference between PTX and CUBIN w.r.t. the NVCC compiler?

我安装了CUDA 4.0, 并安装了2台(460卡)的仪器。

立方米与pt之间有何区别?

我认为,这立方米是树冠的本地代码,因此,这是一种特殊的微层构造,切除是一种中间语言,通过JIT汇编,在Fermi装置(例如Geforce G Sk 460)上运行。 当我编辑<代码>.cu 源文档时,我可以选择从速或立方米的目标。 如果我想起立案卷,我选择。 但是,如果我需要一个字面文件一使用

是否正确?

问题回答

选择汇编阶段的备选办法(-ptx-cubin)与用于控制哪些装置的备选办法(-code)相混合,因此,您应重新审议这些文件。

NVCC是NVIDIA编纂者司机。 <代码>-ptx和-cubin>的备选办法用于选择汇编的具体阶段,否则,任何具体阶段的备选办法都不会试图从投入中产生可执行的效果。 大多数人使用<条码>-c选择,使Nvcc产生一份标书,然后由缺省平台链接器、<条码>-ptx和<条码>-cubin的选项链接起来,只有在您使用动因软件时,才会真正有用。 关于中间阶段的更多信息,检查了在安装CUDA工具包时安装的Nvcc手册。

  • The output from -ptx is a plain-text PTX file. PTX is an intermediate assembly language for NVIDIA GPUs which has not yet been fully optimised and will later be assembled to the device-specific code (different devices have different register counts for example, hence fully optimising PTX would be wrong).
  • The output from -cubin is a fat binary which may contain one or more device-specific binary images as well as (optionally) PTX.

The -code argument you refer to has a different purpose entirely. I d encourage you to check out the nvcc documentation which contains several examples, in general I would advise using the -gencode option instead since it allows more control and allows you to target multiple devices in one binary. As a quick example:

  • -gencode arch=compute_xx,code= compute_xx,sm_yy,sm_zz causes nvcc to target all devices with compute capability xx (that s the arch= bit) and to embed PTX (code=compute_xx) as well as device specific binaries for sm_yy and sm_zz into the final fat binary.




相关问题
Why won t OpenCV compile in NVCC?

I am trying to integrate CUDA and openCV in a project. Problem is openCV won t compile when NVCC is used, while a normal c++ project compiles just fine. This seems odd to me, as I thought NVCC ...

error in CUDA compilation

I m getting this error while trying to run sample codes in CUDA SDK. I have CUDA 2.3 and Visual studio 2008 LINK : fatal error LNK1181: cannot open input file cutil32D.lib Any pointers how to ...

CUDA Memory Allocation accessible for both host and device

I m trying to figure out a way to allocate a block of memory that is accessible by both the host (CPU) and device (GPU). Other than using cudaHostAlloc() function to allocate page-locked memory that ...

热门标签