English 中文(简体)
Condor, Sun Grid Engine, or something else?
原标题:

I m trying to work out whether we should try out Condor or Sun Grid Engine at work (or possibly something else).

We often have lots of unused WinXp workstations. The hope is that we could use wake-on-LAN, run all our jobs, and then shut down automatically. We d mainly be running Matlab, Java or Python simulations for either monte-carlo or parameter explorations.

With my limited knowledge of Condor, it sounds like using a the vm universe might be a convenient way of taking care of snapshots without having to modify existing code.

Is SGE or something else better than condor for this kind of work?

最佳回答

I d start with Condor. It has good support for Windows, and newer versions have built-in support for sending wake-on-lan in a very configurable way when jobs can run on certain machines. It can also shut the machines down based on user-defined policies.

问题回答

SGE doesn t really support windows. It comes with all kinds of caveats and missing bits on Windows.

I ve been running Condor pools for many years now and it is a superb HTPC setup for both cycle-stealing and dedicated, always-on hardware, on Linux and Windows machines. The recent addition of their Rooster daemon lets you put machines to sleep between job cycles and wake them up when new work appears in the pool. They also have an active and very helpful support community. Checkpointing is the only Condor feature not available on Windows. Everything else is there. With the addition of the VM Universe, checkpointing is getting less and less useful. Really: to use checkpointing successfully you need to be able to relink your entire code stack. So if you re running Matlab jobs, even on Linux, checkpointing isn t going to be possible.

If you have specific questions about getting Condor running on Windows I d be happy to answer them, share my experiences with it. I run Condor across 4 pools around the globe with a total of about 1500 dedicated machines in all the pools and some 1000 or so additional desktop machines that are available as users care to donate them.

After Oracle s takeover of SGE (Sun Grid Engine), there is the Open Grid Scheduler project that still offers open-source Grid Engine.

http://gridscheduler.sourceforge.net/

For dedicated hardware I d go with Grid Engine.

For scavenging clock cycles on machines which may be in use I d go with Condor.

For hardware which you have dedicated access to for fixed periods, such as overnight and at weekends, I d probably still go with Condor but might be able to persuade myself to use Grid Engine.

I ve had to choose between condor and SGE for a customer project recently. I was favoring SGE (because I was more familiar with that environment), but Condor won finally because:

  • the customer infrastructure is Windows oriented, and the SGE solution requires a Unix or Linux machine for the Central Manager, + installing MS Services for Unix on the computation hosts
  • support and installation process of Condor on Windows was much simpler.

However, you cannot use the most interesting features of Condor on Windows : checkpointing is not available, nor the Condor specific IOs. I m not using the VM universe, so I cannot comment on that aspect.

I ve only tried Condor, and it was a pain to attempt to set up. If you need all the clock cycles you can fully utiilize, go with Condor.

I m about to try SGE, and I ll tell you how it goes. However at my company, people have had experience setting up SGE, so I ll probably say SGE is easier.

SGE doesn t exist... it s OGE, and it s very expensive. Go with Condor.





相关问题
Assets Management in a clustered environment

I have a content management system running on a web server, that among others allows the user to upload assets like images, files, etc to the server. The problem i have is that there will be 2 ...

Condor, Sun Grid Engine, or something else?

I m trying to work out whether we should try out Condor or Sun Grid Engine at work (or possibly something else). We often have lots of unused WinXp workstations. The hope is that we could use wake-...

Caching in a clustered environment

Caching your data in your application code is generally a good idea for many reasons. We have being doing this for quiet some time in our shared environment which includes ColdFusion, .NET, and PHP. ...

Programmatically detect Windows cluster configuration?

Does anyone know how to programatically detect that a Windows server is part of a cluster? Further, is it possible to detect that the server is the active or passive node? [Edit] And detect it from ...

热门标签