English 中文(简体)
将打开和 sse 指令相结合
原标题:combining openmp and sse instructions
  • 时间:2012-05-24 06:47:39
  •  标签:
  • c
  • openmp
  • sse

原始代码看起来像

for(i=0;i<20;i++){
    if(){
        do(); 
    }
    else{

        num2 = _mm_set_pd(Phasor.imaginary, Phasor.real);

        for(int k=0; k<SamplesIneachPeriodCeil[iterationIndex]; k++) 
        {
            /*SamplesIneachPeriodCeil[iterationIndex] is in range of 175000*/

            num1 = _mm_loaddup_pd(&OutSymbol[k].real);
            num3 = _mm_mul_pd(num2, num1);
            num1 = _mm_loaddup_pd(&OutSymbol[k].imaginary);
            num2 = _mm_shuffle_pd(num2, num2, 1);
            num4 = _mm_mul_pd(num2, num1);
            num3 = _mm_addsub_pd(num3, num4);
            num2 = _mm_shuffle_pd(num2, num2, 1);
            num5 = _mm_set_pd(InSymbolInt8[k],InSymbolInt8[k] );
            num6 = _mm_mul_pd(num3, num5);
            num7 = _mm_set_pd(Out[k].imaginary,Out[k].real);
            num8 = _mm_add_pd(num7,num6);
            _mm_storeu_pd((double *)&Out[k], num8);

        }
        Out = Out + SamplesIneachPeriodCeil[iterationIndex];
    }
}

这个代码给了我15毫微秒的速度

当我修改代码以包括 Openmp 时

注意:这里我只包括其他部分

else{
    int size = SamplesIneachPeriodCeil[iterationIndex];

#pragma omp parallel num_threads(2) shared(size)
    {
        int start,end,tindex,tno,no_of_iteration;
        tindex = omp_get_thread_num();
        tno = omp_get_num_threads();
        start = tindex * size / tno;
        end = (1+ tindex)* size / tno ;
        num2 = _mm_set_pd(Phasor.imaginary, Phasor.real);
        int k;
        for(k = start ; k < end; k++){


            num1 = _mm_loaddup_pd(&OutSymbol[k].real);
            num3 = _mm_mul_pd(num2, num1);
            num1 = _mm_loaddup_pd(&OutSymbol[k].imaginary);
            num2 = _mm_shuffle_pd(num2, num2, 1);
            num4 = _mm_mul_pd(num2, num1);
            num3 = _mm_addsub_pd(num3, num4);
            //_mm_storeu_pd((double *)&newSymbol, num3);
            num2 = _mm_shuffle_pd(num2, num2, 1);
            num5 = _mm_set_pd(InSymbolInt8[k],InSymbolInt8[k] );
            num6 = _mm_mul_pd(num3, num5);
            num7 = _mm_set_pd(Out[k].imaginary,Out[k].real);
            num8 = _mm_add_pd(num7,num6);
            _mm_storeu_pd((double *)&Out[k], num8);


        }
    }
    Out = Out + size;
}

这个代码显示的速度是30毫秒

所以我在想我是不是做错了什么

问题回答

您没有做任何事情来在两个线索之间分配循环执行。 您只是在创建一个平行区域, 有两个线索, 这些线索执行完全相同的代码。 您可能想要做的是将平行区域移动到仅包含 < /code> 环的 < code> 区域, 并使用工作共享结构 :

int k;
#pragma omp parallel for num_threads(2) ...
for(k = start ; k < end; k++){
   ...
}

感谢 Tudor 校正 。 您的代码被正确平行, 但您在循环中有一个平行区域 。 进入和退出一个平行区域与一些间接费用相关。 通常这被称为“ 叉/ join 模型 ”, 输入区域时会创建一组线条, 然后所有的线条在退出时会与母体连接。 大多数 OpenMP 运行时会使用各种线条集合技术来降低间接费用, 但它仍然存在 。

您的环绕运行为 15 毫秒。 与 OpenMP 间接费用相比, 这已经足够快了, 使得管理费变得可见。 想象一下将平行区域移动到外部环绕上, 将管理费减少最多20 倍( 取决于使用 < code>else < / code > 分支的次数), 但是您可能仍然看不到计算时间的改善 。

如果问题足够大,通信或同步管理费与计算时间相比微不足道或至少很小,则只能对程序加以平行处理。

您应该在外部环绕( over < code> > i ) 之外开始您的平行区域, 并使用 < code> omp 进行 < / code > 的平行区域。 环圈内使用的所有变量( num1 , < code> > num2 ) 最好在内部声明, 以便自动显示 < code > 私人 < / code > (事实上, 大部分可以重新使用, 但编译者应该发现我们无论如何都存在) 。





相关问题
Fastest method for running a binary search on a file in C?

For example, let s say I want to find a particular word or number in a file. The contents are in sorted order (obviously). Since I want to run a binary search on the file, it seems like a real waste ...

Print possible strings created from a Number

Given a 10 digit Telephone Number, we have to print all possible strings created from that. The mapping of the numbers is the one as exactly on a phone s keypad. i.e. for 1,0-> No Letter for 2->...

Tips for debugging a made-for-linux application on windows?

I m trying to find the source of a bug I have found in an open-source application. I have managed to get a build up and running on my Windows machine, but I m having trouble finding the spot in the ...

Trying to split by two delimiters and it doesn t work - C

I wrote below code to readin line by line from stdin ex. city=Boston;city=New York;city=Chicago and then split each line by ; delimiter and print each record. Then in yet another loop I try to ...

Good, free, easy-to-use C graphics libraries? [closed]

I was wondering if there were any good free graphics libraries for C that are easy to use? It s for plotting 2d and 3d graphs and then saving to a file. It s on a Linux system and there s no gnuplot ...

Encoding, decoding an integer to a char array

Please note that this is not homework and i did search before starting this new thread. I got Store an int in a char array? I was looking for an answer but didn t get any satisfactory answer in the ...