English 中文(简体)
Test MPI_Barrier C++
原标题:

How can I be sure that MPI_Barrier act correctly? What s the method of test for that?
Thank you

问题回答

I think that to be sure that the MPI_Barrier is working correctly you have to write a program which is guaranteed to behave differently for working and non-working barriers.

I don t think that @Neeraj s answer is guaranteed to behave that way. If the barrier is working correctly the processes will all write their first output lines before any writes a second output line. However it is possible that this will happen even in the absence of the barrier (or where the barrier has failed completely if you want to think of it this way). My assertion does not depend on the very short sleep times he suggests (5msrank). Even if you suppose that the processes wait (5srank) it is possible that the statements would appear in the barrier-imposed order in the absence of the barrier. Unlikely I grant you, but not impossible, especially when you have to consider how the o/s buffers multiple writes to stdout -- you might actually be testing that process not the barrier. Oh you cry even the most inaccurate computer clock will result in process 1 waiting enough less time than process 2 to show the correct working of the barrier. Not if the o/s preemptively grabs processor 1 (on which process 1 is trying to run) for 10s it doesn t.

Dependence on the on-board clocks for synchronisation actually makes the program less deterministic. All the processors have their own clocks, and the hardware doesn t make any guarantees that they all tick at exactly the same rate or with exactly the same tick length.

Nor does that test adequately explore all the failure modes of the barrier. At best it only explores the complete failure; what if the implementation is actually a leaky barrier, so that occasionally a process gets through before the last process has reached the barrier ? Off-by-one errors are incredibly common in programs. Or perhaps the barrier code was written 3 years ago and only has enough memory to record the arrival of, say, 2^12==4096 processes and you ve put it on a brand new machine with 2^18 processors; the barrier is more of a weir than a dam.

I haven t thought about this deeply until now, I ve never suspected that any of the MPI implementations I ve used had faulty barriers, so I don t have a good suggestion about how to thoroughly test a barrier. I d be inclined to use a parallel debugger and examine the execution of the program through the barrier, but that s not going to provide a guarantee of correct behaviour.

It s an interesting question though.


#include <mpi.h>

int main (int argc , char *argv[])
{
  int rank;

  MPI_Init (&argc, &argv);      /* starts MPI */
  MPI_Comm_rank (MPI_COMM_WORLD, &rank);        /* get current process id */

  sleep(5*rank); // make sure each process waits for different amount of time
  std::cout << "Synchronization point for:" << rank << std::endl ;
  MPI_Barrier(MPI_COMM_WORLD) ;
  std::cout << "After Synchronization, id:" << rank << std::endl ;

  MPI_Finalize();
  return 0;
}

Allen Downey in his book The Little Book of Semaphores says this (about a reusable barrier algorithm he presents):

Unfortunately, this solution is typical of most non-trivial synchronization code: it is difficult to be sure that a solution is correct. Often there is a subtle way that a particular path through the program can cause an error.

To make matters worse, testing an implementation of a solution is not much help. The error might occur very rarely because the particular path that causes it might require a spectacularly unlucky combination of circumstances. Such errors are almost impossible to reproduce and debug by conventional means.

The only alternative is to examine the code carefully and “prove” that it is correct. I put “prove” in quotation marks because I don’t mean, necessarily, that you have to write a formal proof (although there are zealots who encourage such lunacy).





相关问题
Undefined reference

I m getting this linker error. I know a way around it, but it s bugging me because another part of the project s linking fine and it s designed almost identically. First, I have namespace LCD. Then I ...

C++ Equivalent of Tidy

Is there an equivalent to tidy for HTML code for C++? I have searched on the internet, but I find nothing but C++ wrappers for tidy, etc... I think the keyword tidy is what has me hung up. I am ...

Template Classes in C++ ... a required skill set?

I m new to C++ and am wondering how much time I should invest in learning how to implement template classes. Are they widely used in industry, or is this something I should move through quickly?

Print possible strings created from a Number

Given a 10 digit Telephone Number, we have to print all possible strings created from that. The mapping of the numbers is the one as exactly on a phone s keypad. i.e. for 1,0-> No Letter for 2->...

typedef ing STL wstring

Why is it when i do the following i get errors when relating to with wchar_t? namespace Foo { typedef std::wstring String; } Now i declare all my strings as Foo::String through out the program, ...

C# Marshal / Pinvoke CBitmap?

I cannot figure out how to marshal a C++ CBitmap to a C# Bitmap or Image class. My import looks like this: [DllImport(@"test.dll", CharSet = CharSet.Unicode)] public static extern IntPtr ...

Window iconification status via Xlib

Is it possible to check with the means of pure X11/Xlib only whether the given window is iconified/minimized, and, if it is, how?

热门标签