English 中文(简体)
平行矩阵产品
原标题:Parallel matrix product

为了在2个矩阵A和B(nxm尺寸)之间以平行方式计算产品,我有以下限制:服务器从矩阵A向每个客户发送了若干行,并从矩阵B中抽出若干行。 这不能改变。 此外,客户可以相互交流信息,以便计算矩阵产品,但不能要求服务器发送任何其他数据。

这样做应当尽可能高效,即尽可能减少各进程之间发送的信息数量——认为这是一项昂贵的行动——并同时进行尽可能小的计算。

From what I have researched, practically the highest number of messages exchanged between the clients is n^2, in case each process broadcasts its lines to all the others. Now, the problem is that if I minimize the number of messages sent - this would be around log(n) for distributing the input data - but the computation then would only be done by one process, or more, but anyhow, it is not anymore done in parallel, which was the main idea of the problem.

哪一种算法是更有效率的算法,可以计算这一产品?

(I am using MPI, if it makes any difference).

最佳回答

为计算矩阵产品<代码>C = Ax B 要素-逐-element, 仅计算C(i,j) = dot_product(A(i,:B(:j)。 也就是说,C的(i,j)项是A行第一行和B栏 j的原始产品。

如果你坚持将A行和B行各行各起来,那么你就会有一段艰难的时间来撰写一份平行方案,其业绩超过一个直截了当的系列方案。 相反,你应该做的是,将A和B栏的排向加工商,以计算C部分的内容。 如果你不得不寄送A和B的浏览器,那么我建议你这样做,但将产品打入服务器。 也就是说,无视所有工人加工商,只是按序进行计算。

一种替代办法是计算工人加工商的部分产品,并积累部分成果。 这将需要一些trick熟的方案规划;但是,如果在你第一次尝试时,你能够写出一个方案(执行速度)超过一个简单的系列方案,我会感到非常惊讶。

(Yes,有其他办法将矩阵-矩阵-矩阵产品用于平行执行,但比上述更为复杂。) 如果你想调查这些内容的话,请:http://www.amazon.co.uk/Coutils-Hopkins-Studies-Mathematical-Sciences/dp/0801854148/ref=sr_1_1?s=books&ie=UTF8&qid=12918462&sr=1-1”rel=“noretinger”>Matrix Coutils是开始阅读的地点。

您还需要认真考虑您提出的增效措施——最有效的信息传递方案将是没有信息的方案。 如果电文的费用远远超出计算成本,那么这两项措施将最高效地实施无电离层。 总体来说,平行方案的效率措施是加速与加工商数量的比率:8个加工商的速率是完全有效的(而且通常不可能实现)。

As stated yours is not a sensible problem. Either the problem-setter has mis-specified it, or you have mis-stated (or mis-understood) a correct specification.

问题回答

某些不正确之处:如果两个矩阵都有<代码>n x m的尺寸,则不能将两者加在一起(无<代码>n = m)。 就A*B而言,A必须拥有与B的行文相同的栏目。 您是否确信,服务器是B分行的装机。 这相当于从B分立一栏,在这种情况下,解决办法是微不足道的。

Assuming that all those check out, and your clients do indeed get rows from A and B: probably the easiest solution would be for each client to send its rows of matrix B to client #0, who reassambles the original matrix B, then sends out its columns back to the other clients. Basically, client #0 would act as a server that actually knows how to efficiently decompose data. This would be 2*(n-1) messages (not counting the ones used to reunite the product matrix), but considering how you already need n messages to distribute the A and B matrices between the clients, there s no significant performance loss (it s still O(n) messages).

这里最大的瓶颈显然是矩阵B的初次收集和再分配,后者规模巨大,因此,如果你有相当小的矩阵和许多过程,那么你可能最好在服务器上按序计算产品。





相关问题
How to add/merge several Big O s into one

If I have an algorithm which is comprised of (let s say) three sub-algorithms, all with different O() characteristics, e.g.: algorithm A: O(n) algorithm B: O(log(n)) algorithm C: O(n log(n)) How do ...

Grokking Timsort

There s a (relatively) new sort on the block called Timsort. It s been used as Python s list.sort, and is now going to be the new Array.sort in Java 7. There s some documentation and a tiny Wikipedia ...

Manually implementing high performance algorithms in .NET

As a learning experience I recently tried implementing Quicksort with 3 way partitioning in C#. Apart from needing to add an extra range check on the left/right variables before the recursive call, ...

Print possible strings created from a Number

Given a 10 digit Telephone Number, we have to print all possible strings created from that. The mapping of the numbers is the one as exactly on a phone s keypad. i.e. for 1,0-> No Letter for 2->...

Enumerating All Minimal Directed Cycles Of A Directed Graph

I have a directed graph and my problem is to enumerate all the minimal (cycles that cannot be constructed as the union of other cycles) directed cycles of this graph. This is different from what the ...

Quick padding of a string in Delphi

I was trying to speed up a certain routine in an application, and my profiler, AQTime, identified one method in particular as a bottleneck. The method has been with us for years, and is part of a "...

热门标签