English 中文(简体)
如何用通用公平市价面面面值/面值计算平均价值?
原标题:How to calculate average value by masked value/texture on GPU?

Background (if helpful)

I m 在Windows上与NVIDIA R Sk-Cards合著的“全球移动式实时审查光标”。

我写如下:

  • http://www.ohchr.org。 D<float3> TexColor,根据每个地貌表面的有限精度光图解决结果。

  • MakedID, Integrat(objectId, uint2(lightmapUV.x, LightmapUV.y) , 用于独一无二地确定光标线Xel, 至1920x1080年RWTexture2D<uint> TexMasked , 用于过滤密码>TexColor <>/code>,以根据特定光标决议计算结果。

最后,我把结果运用到现场,以提供光标预览。

What I want to do is:

RWTexture<float3>TexColor; // RBGA32
Texture<uint> TexMasked;   // R32

void get_masked_mean(uint2 pixelPos)
{
    //? How do it on GPU?
    // TexColor[pixelPos] = calculateAverage(TexColor[pixel]) for all
    // pixels satisfying (TexMasked[pixel] == TexMasked[pixel])
}

我试图读到万国邮联的TexColor和Tex Masked,但很疲劳:


std::unordered_map<uint, float4> resultMap;
std::vector<std::vector<float3>> colorData[][]  = TexColor.readbakToCPU(); // very slow
std::vector<std::vector<uint>>   maskedData[][] = TexMasked.readbakcToCPU();

for (int r = 0; r < rowCount; ++r)     // 1920
{
    for (int c = 0; c < colCount; ++c) // 1080
    {
       uint key = maskedData[r][c];
       uint oldCount = resultMap[key].w;
       uint newCount = oldCount + 1;

       // very slow
       resultMap[key] = float4((colorData[r][c] + count * resultMap[key].xyz) / newCount), newCount);
    }
}

for (int r = 0; r < rowCount; ++r)
{
    for (int c = 0; c < colCount; ++c)
    {
       
       uint key = maskedData[r][c];
       colorData[r][c] = resultMap[key].xyz; // very solw
    }
}

TexColor.uploadToGPU(colorData);

从万国邮联到万国邮联阅读数据,加上约1 000 000公顷的 queries问非常缓慢。 这里是否有更有效的《万国邮联守则》? 或者 GPU-Based Code (compute shader)?

问题回答

完善这一法典有很多办法。

如果你想要坚持基于CPU的计算方法(这样做容易得多,但总是会大大放缓或与其它提款协议相提),我要做的第一项工作就是找到更快的草图执行。 你回头看上去大约4 000 000赫桌子,因此,甚至小小小幅地表接入速度的改善可以产生可衡量的变化。 它获得了巨大收益,但收成极低。

如果你多次进行这一计算(例如,在互动应用中采用单一框架),你也可以调节工作: 你们不能够从旁获取文字记忆,而总是从几个框架前获得记忆,而邮联后来也从几个框架前获得传承的记忆。 这种增加的记忆费很多(因为你必须随时保持飞行的多重框架),而且万国邮联将始终只看到在两个框架之前进行计算的结果,但如果你在飞行中保持足够多的飞行框架,这也将完全取消宣读万国邮联案文和将结果上载回万国邮联的绩效成本。 显然,如果这是一次性的计算,或者如果它不能接受对万国邮联的延迟结果,那就是一种选择。

You could then further optimize the code by using multiple threads, though that work would probably not be worth it.

Doing this computation on the GPU is possible, but much more involved. Since GPUs are highly parellized, implementing a hash map on the GPU is very challenging. Instead, I would use multiple shaders to re-organize the data until you don t need a hash table anymore. The basic idea is, to sort your data by key (store the original position of each entry in the sorted data), then to use prefix sum to flatten all successive entries with the same key into a single number, which you then can use to schedule work for each unique key. The exact implementation of this depends on a few details - for example, you might use indirect execution to schedule one thread (or one thread group/etc) per unique key, or you might re-write your math such that it can be done atomically, and then run one thread per pixel (note that atomic operations on floats are only supported with some vendor specific extensions, and are not available on every hardware). At the end, using the stored original position of each entry before the sort, and the position of the entry in the sorted data, you can write the result into each pixel.

整个过程需要多方面的 comp子和额外的缓冲,并撰写对sha和整个管道的 comp子,以便它们能够迅速和高效地加以改造。 但是,如果你把工作与时间摆在桌面上,你可以比万国邮联的一方办法更快地做到这一点(在好硬件上,我会猜想不到一ms)。





相关问题
OpenGL 3D Selection

I am trying to create a 3D robot that should perform certain actions when certain body parts are clicked. I have successfully (sort of) implemented picking in that if you click on any x-plane part, it ...

Why and when should you use hoops 3d graphics? [closed]

My company needs 3D visualization for our commercial applications (CAD, mesh manipulation, computational geometry). We are tired of true vision 3D (tv3d), which we ve been using for years (poor ...

Prerequisite for learning directx

I am from .net C# background and I want to learn DirectX. I have knowledge of C++ but I am fairly new to graphic world. I am little confused about how to start learning directx, should I start ...

Radial plotting algorithm

I have to write an algorithm in AS3.0 that plots the location of points radially. I d like to input a radius and an angle at which the point should be placed. Obviously I remember from geometry ...

Determine VRAM size on windows

I need to determine roughly how much VRAM a system s graphics card has. I know all the reasons why I shouldn t but I do. It doesn t need to be perfect (some cards lie etc.) but I need a ballpark. On ...

Mask white color out of PNG image on iPhone

I know that there are already 2 relevant posts on that, but it s not clear... So the case is this...: I have a UIImage which consists of a png file requested from a url on the net. Is it possible to ...

Good, free, easy-to-use C graphics libraries? [closed]

I was wondering if there were any good free graphics libraries for C that are easy to use? It s for plotting 2d and 3d graphs and then saving to a file. It s on a Linux system and there s no gnuplot ...

热门标签