English 中文(简体)
OpenCL Texture Memory
原标题:

I m fairly new to OpenCL so please bear with me.

In the first iteration of my code, I used basic memory buffers for large datasets and declared them global. However now that I m looking to improve the timing, I wanted to use texture memory for this. In the CUDA version, we use cudaBindTexture and tex1Dfetch to obtain the data for a large 1D float array. From my understanding of the specification, texture memory is the same thing as image memory. However, since there are only 2D and 3D image objects with max heights and widths, I run into some issues. My array larger than max height/width, but not max height * max width. Must I convert my 1D array into 2D? Or is there a better way to do it?

Or am I completely off?

I did read http://forums.nvidia.com/index.php?showtopic=151743 and http://forums.nvidia.com/index.php?showtopic=150454 but they weren t exactly conclusive in whether the texture memory referred to in Best Practices and Programming Guide was in fact image objects.

Thanks and any help/suggestions are greatly welcome!

最佳回答

I found the best answer as a reply to my post on NVidia s forum here.

问题回答

My array larger than max height/width, but not max height * max width. Must I convert my 1D array into 2D?

Yes, the texture hardware has constraints on the maximum index values. If you exceed these values, you ll need to convert to using multiple index values.

That said, I m not implying that converting to texture access is going to speedup your program.

OpenCL 1.2 supports 1D textures. The issue is NVIDIA only supports OpenCL 1.1 unlike AMD or Intel...





相关问题
Optimizing a LAN server for a game

I m the network programmer on a school game project. We want to have up to 16 players at once on a LAN. I am using the Server-Client model and am creating a new thread per client that joins. ...

SQL Table Size And Query Performance

We have a number of items coming in from a web service; each item containing an unknown number of properties. We are storing them in a database with the following Schema. Items - ItemID - ...

Most optimized way to store crawler states?

I m currently writing a web crawler (using the python framework scrapy). Recently I had to implement a pause/resume system. The solution I implemented is of the simplest kind and, basically, stores ...

Do bitwise operations distribute over addition?

I m looking at an algorithm I m trying to optimize, and it s basically a lot of bit twiddling, followed by some additions in a tight feedback. If I could use carry-save addition for the adders, it ...

Improve INSERT-per-second performance of SQLite

Optimizing SQLite is tricky. Bulk-insert performance of a C application can vary from 85 inserts per second to over 96,000 inserts per second! Background: We are using SQLite as part of a desktop ...

Profiling Vim startup time

I’ve got a lot of plugins enabled when using Vim – I have collected plugins over the years. I’m a bit fed up with how long Vim takes to start now, so I’d like to profile its startup and see which of ...

Quick padding of a string in Delphi

I was trying to speed up a certain routine in an application, and my profiler, AQTime, identified one method in particular as a bottleneck. The method has been with us for years, and is part of a "...

热门标签