English 中文(简体)
Error handling strategies for Microsoft HPC tasks
原标题:

I have a .NET app that will be spawning tasks to run on an MS HPC cluster. We re not using any of that fancy DryadLINQ stuff, just remotely executing an exe on the cluster and passing arguments via the command line. The task will be .NET code, and I d like the calling app to get an actual Exception object when an error occurs on HPC.

What s the best general technique for accomplishing this?

Let me know if you need any more info.

Thanks!

最佳回答

You can t pass the exception back from your executable to the client HPC app when you re using the batch scheduler. If it s good enough to know that one of the tasks or jobs that you queued failed, then you can hold onto a SchedulerJob object and add a callback to the OnJobState or OnTaskState event. Whenever your job (or a task in that job) changes state you ll get the jobid/taskid and state change information in your callback; then you can check if the state was changed to "Failed" and act on that information.

To mark a task or job as "Failed", have your executable exit with a non-zero exit code. If you need details on the actual exception, the best you can do is print it to stdout.

If you really need all the exception details, an alternative might be to use the SOA framework for your computations. Advantages would be:

  • your compute requests look like WCF method calls

  • you get detailed exceptions back when your code throws

  • you can use the SOA debugger extension to Visual Studio to debug your code

Disadvantages would be:

  • More complex to write and deploy your app starting from your existing code base.

Here are some resources to get you started (a search for "Windows HPC SOA" should get you much more):

MSDN SOA documentation

问题回答

暂无回答




相关问题
Manually implementing high performance algorithms in .NET

As a learning experience I recently tried implementing Quicksort with 3 way partitioning in C#. Apart from needing to add an extra range check on the left/right variables before the recursive call, ...

Anyone feel like passing it forward?

I m the only developer in my company, and am getting along well as an autodidact, but I know I m missing out on the education one gets from working with and having code reviewed by more senior devs. ...

How do I compare two decimals to 10 decimal places?

I m using decimal type (.net), and I want to see if two numbers are equal. But I only want to be accurate to 10 decimal places. For example take these three numbers. I want them all to be equal. 0....

Exception practices when creating a SynchronizationContext?

I m creating an STA version of the SynchronizationContext for use in Windows Workflow 4.0. I m wondering what to do about exceptions when Post-ing callbacks. The SynchronizationContext can be used ...

Show running instance in single instance application

I am building an application with C#. I managed to turn this into a single instance application by checking if the same process is already running. Process[] pname = Process.GetProcessesByName("...

How to combine DataTrigger and EventTrigger?

NOTE I have asked the related question (with an accepted answer): How to combine DataTrigger and Trigger? I think I need to combine an EventTrigger and a DataTrigger to achieve what I m after: when ...

热门标签