English 中文(简体)
为什么这个SOR求解器的速度取决于输入?
原标题:Why does the speed of this SOR solver depend on the input?

关于我的,我现在修改了SOR(Successive Over-Relaxation)方法的分离矩阵解决器。 现行法典如下:

void SORSolver::step() {
    float const omega = 1.0f;
    float const
        *b = &d_b(1, 1),
        *w = &d_w(1, 1), *e = &d_e(1, 1), *s = &d_s(1, 1), *n = &d_n(1, 1),
        *xw = &d_x(0, 1), *xe = &d_x(2, 1), *xs = &d_x(1, 0), *xn = &d_x(1, 2);
    float *xc = &d_x(1, 1);
    for (size_t y = 1; y < d_ny - 1; ++y) {
        for (size_t x = 1; x < d_nx - 1; ++x) {
            float diff = *b
                - *xc
                - *e * *xe
                - *s * *xs - *n * *xn
                - *w * *xw;
            *xc += omega * diff;
            ++b;
            ++w; ++e; ++s; ++n;
            ++xw; ++xe; ++xs; ++xn;
            ++xc;
        }
        b += 2;
        w += 2; e += 2; s += 2; n += 2;
        xw += 2; xe += 2; xs += 2; xn += 2;
        xc += 2;
    }
}

现在奇怪的事情是:如果我增加omega(松弛因子),执行速度会极大地取决于各个数组中的值!

对于omega = 1.0f,执行时间基本保持不变。对于omega = 1.8,第一次执行step() 10 次通常需要大约 5 毫秒,但这在仿真过程中会逐渐增加到接近 100 毫秒。如果我将omega = 1.0001f,我会看到相应的执行时间略微增加;omega越高,在仿真过程中执行时间将增加得越快。

由于所有这些都嵌入在流体求解器内部,很难提供一个独立的示例。但我保存了初始状态并在每个时间步上重新运行了求解器,以及解决实际的时间步。对于初始状态而言,速度很快,而对于后续时间步而言则逐步变慢。由于其他所有内容都相同,这证明了该代码的执行速度取决于这些六个数组中的值。

这在Ubuntu和带有g++的64位Windows 7上是可重复的,在32位的VS2008编译时也是可重复的。

我听说NaN和Inf值在浮点计算中可能会变慢,但是没有NaN或Inf存在。浮点计算的速度是否取决于输入数字的值?

最佳回答
问题回答

celion的答案原来是正确的。他链接的文章说要打开将非规格化值刷新为零的选项:

#include <float.h>
_controlfp(_MCW_DN, _DN_FLUSH);

然而,这仅适用于Windows。在Linux上使用gcc,我使用了一些嵌入式汇编来完成同样的操作:

int mxcsr;
__asm__("stmxcsr %0" : "=m"(mxcsr) : :);
mxcsr |= (1 << 15); // set bit 15: flush-to-zero mode
__asm__("ldmxcsr %0" : : "m"(mxcsr) :);

这是我的第一次x86汇编,所以它可能还有改进的空间。但对于我来说,它已经足够了。





相关问题
Undefined reference

I m getting this linker error. I know a way around it, but it s bugging me because another part of the project s linking fine and it s designed almost identically. First, I have namespace LCD. Then I ...

C++ Equivalent of Tidy

Is there an equivalent to tidy for HTML code for C++? I have searched on the internet, but I find nothing but C++ wrappers for tidy, etc... I think the keyword tidy is what has me hung up. I am ...

Template Classes in C++ ... a required skill set?

I m new to C++ and am wondering how much time I should invest in learning how to implement template classes. Are they widely used in industry, or is this something I should move through quickly?

Print possible strings created from a Number

Given a 10 digit Telephone Number, we have to print all possible strings created from that. The mapping of the numbers is the one as exactly on a phone s keypad. i.e. for 1,0-> No Letter for 2->...

typedef ing STL wstring

Why is it when i do the following i get errors when relating to with wchar_t? namespace Foo { typedef std::wstring String; } Now i declare all my strings as Foo::String through out the program, ...

C# Marshal / Pinvoke CBitmap?

I cannot figure out how to marshal a C++ CBitmap to a C# Bitmap or Image class. My import looks like this: [DllImport(@"test.dll", CharSet = CharSet.Unicode)] public static extern IntPtr ...

Window iconification status via Xlib

Is it possible to check with the means of pure X11/Xlib only whether the given window is iconified/minimized, and, if it is, how?

热门标签