Question

我最近对IEEE 754和x87架构进行了相当多的阅读。我正在考虑在一些数字计算代码中使用NaN作为“缺失值”，并希望使用信号NaN可以让我捕获浮点异常，以便在我不想处理“缺失值”的情况下使用。相反，我将使用安静NaN使“缺失值”传递到计算中。然而，信号NaNs并不像我根据（非常有限的）文档所说的那样工作。

这是我所了解的概括（全部使用x87和VC ++）：

_EM_INVALID (the IEEE "invalid" exception) controls the behavior of the x87 when encountering NaNs
If _EM_INVALID is masked (the exception is disabled), no exception is generated and: 和 operations can return quiet NaN. An operation involving signaling NaN will not cause an exception to be thrown, but will be converted to quiet NaN.
If _EM_INVALID is unmasked (exception enabled), an invalid operation (e.g., sqrt(-1)) causes an invalid exception to be thrown.
The x87 never generates signaling NaN.
If _EM_INVALID is unmasked, any use of a signaling NaN (even initializing a variable with it) causes an invalid exception to be thrown.

标准库提供了访问NaN值的方法：

std::numeric_limits<double>::signaling_NaN();

and: 和

std::numeric_limits<double>::quiet_NaN();

问题在于我完全看不出信号NaN有什么用处。如果屏蔽_EM_INVALID，它的行为与静默NaN完全相同。由于没有任何NaN可与另一个NaN进行比较，因此没有逻辑上的区别。

If _EM_INVALID is not masked (exception is enabled), then one cannot even initialize a variable with a signaling NaN: double dVal = std::numeric_limits<double>::signaling_NaN(); because this throws an exception (the signaling NaN value is loaded into an x87 register to store it to the memory address).

你可能像我一样认为以下内容：

Mask _EM_INVALID.
Initialize the variable with signaling NaN.
Unmask_EM_INVALID.

然而，第二步会导致信号NaN被转换为安静NaN，因此随后使用它将不会引发异常！那么WTF？！

Is there any utility or purpose whatsoever to a signaling NaN? I understand: 和 one of the original intents was to initialize memory with it so that use of an unitialized floating point value could be caught.

有人可以告诉我我是否漏掉了什么吗？

编辑：

为了进一步说明我原本希望做的事情，这里有一个例子：

考虑对数据向量（双精度数）执行数学运算。对于某些操作，我希望允许向量包含“缺失值”（例如假设这对应于电子表格中的列，其中某些单元格没有值，但它们的存在是重要的）。对于某些操作，我不希望向量包含“缺失值”。也许如果集合中存在“缺失值”，我希望采取不同的行动——也许执行不同的操作（因此这不是一种无效的状态）。

这个原始代码看起来会像这样：

const double MISSING_VALUE = 1.3579246e123;
using std::vector;

vector<double> missingAllowed(1000000, MISSING_VALUE);
vector<double> missingNotAllowed(1000000, MISSING_VALUE);

// ... populate missingAllowed and: 和 missingNotAllowed with (user) data...

for (vector<double>::iterator it = missingAllowed.begin(); it != missingAllowed.end(); ++it) {
    if (*it != MISSING_VALUE) *it = sqrt(*it); // sqrt() could be any operation
}

for (vector<double>::iterator it = missingNotAllowed.begin(); it != missingNotAllowed.end(); ++it) {
    if (*it != MISSING_VALUE) *it = sqrt(*it);
    else *it = 0;
}

Note that the check for the "missing value" must be performed every loop iteration. While I understand: 和 in most cases, the sqrt function (or any other mathematical operation) will likely overshadow this check, there are cases where the operation is minimal (perhaps just an addition) and: 和 the check is costly. Not to mention the fact that the "missing value" takes a legal input value out of play and: 和 could cause bugs if a calculation legitimately arrives at that value (unlikely though it may be). Also to be technically correct, the user input data should be checked against that value and: 和 an appropriate course of action should be taken. I find this solution inelegant and: 和 less-than-optimal performance-wise. This is performance-critical code, and: 和 we definitely do not have the luxury of parallel data structures or data element objects of some sort.

NaN的版本将是这样的：

using std::vector;

vector<double> missingAllowed(1000000, std::numeric_limits<double>::quiet_NaN());
vector<double> missingNotAllowed(1000000, std::numeric_limits<double>::signaling_NaN());

// ... populate missingAllowed and: 和 missingNotAllowed with (user) data...

for (vector<double>::iterator it = missingAllowed.begin(); it != missingAllowed.end(); ++it) {
    *it = sqrt(*it); // if *it == QNaN then sqrt(*it) == QNaN
}

for (vector<double>::iterator it = missingNotAllowed.begin(); it != missingNotAllowed.end(); ++it) {
    try {
        *it = sqrt(*it);
    } catch (FPInvalidException&) { // assuming _seh_translator set up
        *it = 0;
    }
}

Now the explicit check is eliminated and: 和 performance should be improved. I think this would all work if I could initialize the vector without touching the FPU registers...

Furthermore, I would imagine any self-respecting sqrt implementation checks for NaN and: 和 returns NaN immediately.

Answer 1

据我理解，信令NaN的目的是为了初始化数据结构，但是，当然，在C中的运行时初始化存在将NaN加载到浮点寄存器作为初始化的一部分，从而触发信号的风险，因为编译器不知道这个浮点值需要使用整数寄存器复制。

我希望你可以使用信号NaN来初始化一个静态值，但这甚至需要编译器进行一些特殊处理，以避免它被转换为静默NaN。在初始化期间，你可以使用一些强制转换技巧来避免它被视为浮点值。

如果你在写ASM的话，这不是问题。但是在C和特别是C++中，我认为你必须打破类型系统才能用NaN初始化一个变量。我建议使用memcpy。

Answer 2

使用特殊值（甚至是NULL）可能会使您的数据变得非常混乱，您的代码也会变得非常凌乱。不可能区分QNaN结果和QNaN“特殊”值。

你最好维护一个平行数据结构来跟踪有效性，或者将FP数据放在另一个（稀疏）数据结构中，仅保留有效数据。

这是比较通用的建议；特殊值在某些情况下非常有用（例如非常紧张的内存或性能限制），但随着上下文的增大，它们可能会带来比它们所值更多的困难。

Answer 3

这里是不同双重NaN的位模式：

A signalling NaN is represented by any bit pattern between 7FF0000000000001 and 7FF7FFFFFFFFFFFF or between FFF0000000000001 and FFF7FFFFFFFFFFFF

A quiet NaN is represented by any bit pattern between 7FF8000000000000 and 7FFFFFFFFFFFFFFF or between FFF8000000000000 and FFFFFFFFFFFFFFFF

来源：https://www.doc.ic.ac.uk/~eedwards/compsys/float/nan.html

免责声明：正如其他人指出的那样，施展魔法可能具有潜在的危险，并可能引起未定义的行为。使用memcpy已被建议作为更安全的替代方法。

话虽如此，对于学术目的，或者如果你知道在预期的硬件上是安全的：

从理论上讲，似乎只要将位设为一个信号无穷大数的位，它就应该正常工作。只要将其作为整数类型处理，信号无穷大数与其他整数没有区别。然后，除非存在架构特殊情况的问题，或许你可以通过指针转换将其写在所需位置。如果它按预期工作，它甚至可能比memcpy更快。对于某些嵌入式系统可能也有用。

例子：

const uint64_t sNan = 0xFFF7FFFFFFFFFFFF;
double[] myData;
...
uint64_t* copier = (uint64_t*) &myData[index];
*copier = sNan & ~myErrorFlags;

友情链接