English 中文(简体)
CUDA Memory Allocation accessible for both host and device
原标题:

I m trying to figure out a way to allocate a block of memory that is accessible by both the host (CPU) and device (GPU). Other than using cudaHostAlloc() function to allocate page-locked memory that is accessible to both the CPU and GPU, are there any other ways of allocating such blocks of memory? Thanks in advance for your comments.

最佳回答

The only way for the host and the device to "share" memory is using the newer zero-copy functionality. This is available on the GT200 architecture cards and some newer laptop cards. This memory must be, as you note, allocated with cudaHostAlloc so that it is page locked. There is no alternative, and even this functionality is not available on older CUDA capable cards.

If you re just looking for an easy (possibly non-performant) way to manage host to device transfers, check out the Thrust library. It has a vector class that lets you allocate memory on the device, but read and write to it from host code as if it were on the host.

Another alternative is to write your own wrapper that manages the transfers for you.

问题回答

There is no way to allocate a buffer that is accessible by both the GPU and the CPU unless you use cudaHostAlloc(). This is because not only must you allocate the pinned memory on the CPU (which you could do outside of CUDA), but also you must map the memory into the GPU s (or more specifically, the context s) virtual memory.

It s true that on a discrete GPU zero-copy does incur a bus transfer. However if your access is nicely coalesced and you only consume the data once it can still be efficient, since the alternative is to transfer the data to the device and then read it into the multiprocessors in two stages.

No there is no "Automatic Way" of uploading buffers on the GPU memory.





相关问题
Windows Mobile 6 Emulator change storage?

How do i change the size of the Windows Mobile 6 Emulator. Its fixed at 32mb. I read this post: Increasing Windows Mobile 5 Emulator Storage But it only helps for the 5.0 version. Isnt there any way ...

CUDA Memory Allocation accessible for both host and device

I m trying to figure out a way to allocate a block of memory that is accessible by both the host (CPU) and device (GPU). Other than using cudaHostAlloc() function to allocate page-locked memory that ...

RAM memory reallocation - Windows and Linux

I am working on a project involving optimizing energy consumption within a system. Part of that project consists in allocating RAM memory based on locality, that is allocating memory segments for a ...

Should I send retain or autorelease before returning objects?

I thought I was doing the right thing here but I get several warnings from the Build and Analyze so now I m not so sure. My assumption is (a) that an object I get from a function (dateFromComponents: ...

Java Library Size

If I m given two Java Libraries in Jar format, 1 having no bells and whistles, and the other having lots of them that will mostly go unused.... my question is: How will the larger, mostly unused ...

doubts regarding Memory management in .net

I m learning about Memory management in C# from the book "Professional C#" The presence of the garbage collector means that you will usually not worry about objects that you no longer need; ...

Objective-C returning alloc d memory in a function == bad?

This is on the iPhone. So what if I have a function like - (SomeObject*)buildObject; Do I need to pass in a variable that I have already alloc d outside like - (void)assignObject(SomeObject** out);...

热门标签