English 中文(简体)
PBS 批量系统是否将多个序列工作移至点?
原标题:Does a PBS batch system move multiple serial jobs across nodes?
  • 时间:2011-03-28 00:10:34
  •  标签:
  • pbs
  • torque

如果我需要实施许多“平行”的系列方案(由于问题很简单,但耗费时间——我需要用许多不同的数据组阅读同一方案),解决办法是简单的,如果我只使用一个。 所有的我都是在每次指挥后,如职务说明中,与安插人一起定期提交工作:

./program1 &
./program2 &
./program3 &
./program4

通常由不同的处理器操作每个序列程序。 这在日志服务器或独立工作站运作良好,当然是要求只提一分点的批量工作。

但是,如果我需要将同一方案的110个不同事例改为110个不同的数据集? 如果我向多个节点(第14节)提交一个呈报110 /program#指挥部的文字,批量系统是否在不同节点对不同的处理者进行每一项工作,或者是否试图将所有这些工作推向同一节点,8 核心节点?

我尝试使用简单的MPI代码读取不同的数据,但各种错误导致110个进程中约有100个成功,而另一些则坠毁。 我也考虑了工作阵列,但我不清楚我的制度是否支持。

我在个人数据集上广泛测试了序列程序——没有时间错误,我没有超过每个节点的现有记忆。

最佳回答

No, PBS won t automatically distribute the jobs among nodes for you. But this is a common thing to want to do, and you have a few options.

  • Easiest and in some ways most advantagous for you is to bunch the tasks into 1-node sized chunks, and submit those bundles as individual jobs. This will get your jobs started faster; a 1-node job will normally get scheduled faster than a (say) 14 node job, just because there s more one-node sized holes in the schedule than 14. This works particularly well if all the jobs take roughly the same amount of time, because then doing the division is pretty simple.

  • If you do want to do it all in one job (say, to simplify the bookkeeping), you may or may not have access to the pbsdsh command; there s a good discussion of it here. This lets you run a single script on all the processors in your job. You then write a script which queries $PBS_VNODENUM to find out which of the nnodes*ppn jobs it is, and runs the appropriate task.

  • 如果没有Pbsdsh,Gnu同是另一个能够大大简化这些任务的工具。 就像Xargs一样,如果你重新熟悉,但将同时指挥,包括多个节点。 因此,你提交了14天工作,并拥有第一个半年期工作。 很奇怪的是,即使工作时间不相同,这也会为你安排时间。 我们向用户提供的关于这些物质使用gnu平行系统的咨询是:here。 请注意,如果在贵系统安装了奶粉平行灯塔,并且由于某种原因,你的行政工作取得了一定成绩,那么你就可以在你的家事中建立这种灯塔,而不是一个复杂的建筑。

问题回答




相关问题
MPI on PBS cluster Hello World

I am using mpiexec to run a couple of hello world executables. They each run, but the number of processes is always 1 where it looks like there should be 4 processes. Does someone understand why? ...

Is there a DRMAA Java library that works with Torque/PBS?

Does anybody know a Java implementation of the DRMAA-API that is known to work with PBS/Torque cluster software? The background behind this: I would like to submit jobs to a newly set-up linux ...

Test MPI on a cluster

I am learning OpenMPI on a cluster. Here is my first example. I expect the output would show response from different nodes, but they all respond from the same node node062. I just wonder why and how I ...

how to limit number of concurrently running PBS jobs

I have a 64-node cluster, running PBS Pro. If I submit many hundreds of jobs, I can get 64 running at once. This is great, except when all 64 jobs happen to be nearly I/O bound, and are reading/...

shared memory, MPI and queuing systems

My unix/windows C++ app is already parallelized using MPI: the job is splitted in N cpus and each chunk is executed in parallel, quite efficient, very good speed scaling, the job is done right. But ...

热门标签