English 中文(简体)
在CUDA中,HAAR波盘变换
原标题:HAAR wavelet transform in CUDA

我尝试在CUDA中执行HAAR 1D阵列的HAAR波盘变换。

ALGORITHM

输入数组中有8个索引

有了这个条件 if( x_ index> =o_ width/2 \ y_index> = o_ hileight/2) 我将有4个线条, 应该是 0, 2, 4, 6, 并且我计划对每个线条的输入处理两个索引 。

我计算了 vg. EG : 如果我的线条代号是 0 - 那么 avg 是 (input[0]+input[1]])/2, 然后同时我得到输入[0]-avg等的 diff 。

现在重要的是输出的位置。 我为输出创建了一个单独的线索。 因为使用指数 0, 2, 4, 6 正在给将输出放置在正确的指数中造成困难 。

My avgs should be placed in the first 4 indices i.e 0,1,2,3 of the output-and o_thread_id should be 0,1,2,3. Similarly,to place differences at 4,5,6,7 I have incremented 0,1,2,3 with 4 as shown in the code

PROBLEM

不管我改变什么,我都会得到这个

CODE

__global__ void cal_haar(int input[],float output [],int i_widthstep,int o_widthstep,int o_width,int o_height)
{

    int x_index=blockIdx.x*blockDim.x+threadIdx.x;
    int y_index=blockIdx.y*blockDim.y+threadIdx.y;

    if(x_index>=o_width/2 || y_index>=o_height/2) return;

    int i_thread_id=y_index*i_widthstep+(2*x_index);
    int o_thread_id=y_index*o_widthstep+x_index;

    float avg=(input[i_thread_id]+input[i_thread_id+1])/2;
    float diff=input[i_thread_id]-avg;
    output[o_thread_id]=avg;
    output[o_thread_id+4]=diff;

}

void haar(int input[],float output [],int i_widthstep,int o_widthstep,int o_width,int o_height)
{

    int * d_input;
    float * d_output;

    cudaMalloc(&d_input,i_widthstep*o_height);
    cudaMalloc(&d_output,o_widthstep*o_height);

    cudaMemcpy(d_input,input,i_widthstep*o_height,cudaMemcpyHostToDevice);

    dim3 blocksize(16,16);
    dim3 gridsize;
    gridsize.x=(o_width+blocksize.x-1)/blocksize.x;
    gridsize.y=(o_height+blocksize.y-1)/blocksize.y;

    cal_haar<<<gridsize,blocksize>>>(d_input,d_output,i_widthstep,o_widthstep,o_width,o_height);


    cudaMemcpy(output,d_output,o_widthstep*o_height,cudaMemcpyDeviceToHost);

    cudaFree(d_input);
    cudaFree(d_output);

}

我的主要职能如下:

void main()
{
    int in_arr[8]={1,2,3,4,5,6,7,8};
    float out_arr[8];
    int i_widthstep=8*sizeof(int);
    int o_widthstep=8*sizeof(float);
    haar(in_arr,out_arr,i_widthstep,o_widthstep,8,1);

    for(int c=0;c<=7;c++)
    {cout<<out_arr[c]<<endl;}
    cvWaitKey();

}

Can you tell me where I am going wrong that it gives me zeros as output? Thank you.

最佳回答

您的代码有以下问题:

if(x_index>=o_width/2 || y_index>=o_height/2) return;

鉴于 o_h8=1 ,我们有 >>o_h8/2=0 ( >o_h828 is _h8 ,因此这里有整数分解 ,所以没有线条可以执行任何操作。为了实现您想要的, 您可以在这里做浮点数计算, 或者使用 (o_h8+1)/2 (o_width+1)/2 : 它会执行“ 离差” 圆的分割( 您将有 (x_index> = 8+1) / 4*/ amp; y_index> = (1+1/2/*/)

此外,当您在 Y-dimension 中有超过 1 条线索时, 处理有问题, 因为从那时起您 < code> > i_ thread_ id o_thread_ id 计算是错误的(

问题回答

暂无回答




相关问题
Why won t OpenCV compile in NVCC?

I am trying to integrate CUDA and openCV in a project. Problem is openCV won t compile when NVCC is used, while a normal c++ project compiles just fine. This seems odd to me, as I thought NVCC ...

error in CUDA compilation

I m getting this error while trying to run sample codes in CUDA SDK. I have CUDA 2.3 and Visual studio 2008 LINK : fatal error LNK1181: cannot open input file cutil32D.lib Any pointers how to ...

CUDA Memory Allocation accessible for both host and device

I m trying to figure out a way to allocate a block of memory that is accessible by both the host (CPU) and device (GPU). Other than using cudaHostAlloc() function to allocate page-locked memory that ...