Question

我想评估我的Windows Azure Table 如何存储查询比例表。为此,我设计了一个简单的测试环境,我可以在这个环境中增加我表格中的数据数量,并测量查询的执行时间。根据我想界定一个成本函数,用来评估未来查询的绩效。

我评价了以下查询:

Query with PartitionKey and RowKey
Query with PartitionKey and an attribute
Query with PartitionKey and two RowKeys
Query with PartitionKey and two attributes

对于最后两个问题,我检查了以下两种模式:

PartitionKey == "..." && (RowKey == "..." || RowKey == "...")
(PartitionKey == "..." && RowKey == "...") || (PartitionKey == "..." && RowKey == "...")

为了尽量减少转移延迟,我对一个Azure实例进行了测试。从测量中,我可以看到

query 1 (not surprisingly, as the table is indexed based on those fields) is extremely fast, it s about 10-15ms if I have about 150000 entries in the table.
query 2 requires a partition scan, so the execution time is increasing linearly with the stored data.
query 3.1 performs almost exactly as query 2. So this query is also executed with a full partition scan, which for me seems a bit odd.
query 4.1 is a bit more than two times slower than query 3.1. So it seems like it is evaluated with two partition scans.
and finally, query 3.2 and 4.2 performs almost exactly 4 times slower than query 2.

您能否解释查询/过滤翻译的内涵?即使我们接受查询3.1需要分区扫描,查询4.1也可以用同样的逻辑(在同一时间)来评估。查询3.2和4.2对我来说似乎是一个谜。有任何指点吗?

显然,这整个要点是,我想在一个查询中查询不同的要素,以尽量减少成本,同时不丧失性能。但似乎每个要素都使用单独的查询(与任务平行图书馆)是唯一真正的快速解决方案。这样做的公认方式是什么?

Answer 1

在像 3. 2 和 4. 2 这样的查询中, 将会有一个完整的分区扫描, 一个一个一个和属性。即使这些分区在两台独立的机器上, 查询也不会同时运行。这也是为什么您会看到如此长的时间执行。这是因为 Windows Azure 无法对查询进行查询优化。以某种方式写入代码, 以便它们可以平行运行, 这是代码责任。

您是对的, 如果您想要更快的性能, 您需要使用任务平行图书馆平行运行查询, 以取得更高的性能。

Answer 2

由于表格存储内部实施的细节是非公开的,如果你想评估未来查询的绩效,我建议你检查