English 中文(简体)
R programming - submitting jobs on a multiple node linux cluster using PBS
原标题:

I am running R on a multiple node Linux cluster. I would like to run my analysis on R using scripts or batch mode without using parallel computing software such as MPI or snow.

I know this can be done by dividing the input data such that each node runs different parts of the data.

My question is how do I go about this exactly? I am not sure how I should code my scripts. An example would be very helpful!

I have been running my scripts so far using PBS but it only seems to run on one node as R is a single thread program. Hence, I need to figure out how to adjust my code so it distributes labor to all of the nodes.

Here is what I have been doing so far:

1) command line:

> qsub myjobs.pbs

2) myjobs.pbs:

> #!/bin/sh
> #PBS -l nodes=6:ppn=2
> #PBS -l walltime=00:05:00
> #PBS -l arch=x86_64
> 
> pbsdsh -v $PBS_O_WORKDIR/myscript.sh

3) myscript.sh:

#!/bin/sh
cd $PBS_O_WORKDIR
R CMD BATCH --no-save my_script.R

4) my_script.R:

> library(survival)
> ...
> write.table(test,"TESTER.csv",
> sep=",", row.names=F, quote=F)

Any suggestions will be appreciated! Thank you!

-CC

问题回答

This is rather a PBS question; I usually make an R script (with Rscript path after #!) and make it gather a parameter (using commandArgs function) that controls which "part of the job" this current instance should make. Because I use multicore a lot I usually have to use only 3-4 nodes, so I just submit few jobs calling this R script with each of a possible control argument values.
On the other hand your use of pbsdsh should do its job... Then the value of PBS_TASKNUM can be used as a control parameter.

This was an answer to a related question - but it s an answer to the comment above (as well).

For most of our work we do run multiple R sessions in parallel using qsub (instead).

If it is for multiple files I normally do:

while read infile rest
do
qsub -v infile=$infile call_r.pbs 
done < list_of_infiles.txt

call_r.pbs:

...
R --vanilla -f analyse_file.R $infile
...

analyse_file.R:

args <- commandArgs()
infile=args[5]
outfile=paste(infile,".out",sep="")...

Then I combine all the output afterwards...

This problem seems very well suited for use of GNU parallel. GNU parallel has an excellent tutorial here. I m not familiar with pbsdsh, and I m new to HPC, but to me it looks like pbsdsh serves a similar purpose as GNU parallel. I m also not familiar with launching R from the command line with arguments, but here is my guess at how your PBS file would look:

#!/bin/sh
#PBS -l nodes=6:ppn=2
#PBS -l walltime=00:05:00
#PBS -l arch=x86_64
...
parallel -j2 --env $PBS_O_WORKDIR --sshloginfile $PBS_NODEFILE 
  Rscript myscript.R {} :::: infilelist.txt

where infilelist.txt lists the data files you want to process, e.g.:

inputdata01.dat
inputdata02.dat
...
inputdata12.dat

Your myscript.R would access the command line argument to load and process the specified input file.

My main purpose with this answer is to point out the availability of GNU parallel, which came about after the original question was posted. Hopefully someone else can provide a more tangible example. Also, I am still wobbly with my usage of parallel, for example, I m unsure of the -j2 option. (See my related question.)





相关问题
Signed executables under Linux

For security reasons, it is desirable to check the integrity of code before execution, avoiding tampered software by an attacker. So, my question is How to sign executable code and run only trusted ...

encoding of file shell script

How can I check the file encoding in a shell script? I need to know if a file is encoded in utf-8 or iso-8859-1. Thanks

How to write a Remote DataModule to run on a linux server?

i would like to know if there are any solution to do this. Does anyone? The big picture: I want to access data over the web, using my delphi thin clients. But i´would like to keep my server/service ...

How can I use exit codes to run shell scripts sequentially?

Since cruise control is full of bugs that have wasted my entire week, I have decided the existing shell scripts I have are simpler and thus better. Here is what I have so far svn update /var/www/...

Good, free, easy-to-use C graphics libraries? [closed]

I was wondering if there were any good free graphics libraries for C that are easy to use? It s for plotting 2d and 3d graphs and then saving to a file. It s on a Linux system and there s no gnuplot ...

热门标签