How to make computers work for J. Lakshmi Advanced User Education Programme

How to make computers work for
you when you are enjoying life.
J. Lakshmi
3/16/2009
Advanced User Education Programme
SERC, IISc.
1
Agenda
• Conceptual introduction to batch computing.
• Different batch schedulers available in SERC.
• Queue configuration and job-submission
information for different schedulers.
• Generic guidelines while using batch
schedulers.
• Questions!
3/16/2009
Advanced User Education Programme SERC,IISc.
2
Batch Computing – Introduction
• What is Batch Computing?
– A set (a "batch") of commands or jobs, typically as a
file, submitted to a system which then executes them
and returns the results, all without human
intervention. This contrasts with an interactive system
where the user's commands and the computer's
responses are interleaved.
• When to use?
– Tested programs or codes that need to be run multiple
times, with different data and have execution times
greater than one hour.
3/16/2009
Advanced User Education Programme SERC,IISc.
3
Need for Batch Computing
• On a single system or distributed set of machines,
resources are limited.
• On shared resources, submitting many jobs
exhausts the limited set of system resources
which is reflected in terms of variable job
execution times. In case of parallel programs, it
can also cause errors in execution.
• Batch computing allows for controlled and
balanced use of system resources leading to
deterministic job execution times and throughput
of the system.
3/16/2009
Advanced User Education Programme SERC,IISc.
4
Generic Architecture of batch
schedulers.
Job
Scheduler
Batch Scheduler
Job
Queues
Batch Server
Submission Client
Job script
Execution
node
3/16/2009
Advanced User Education Programme SERC,IISc.
Execution
node
Execution
node
5
Batch Schedulers at SERC.
• LoadLeveler from IBM.
• PBSPro from Altair Inc.
• LSF from Platform Computing Inc.
3/16/2009
Advanced User Education Programme SERC,IISc.
6
LoadLeveller@SERC
• LoadLeveler (LL) is the batch scheduler from IBM.
• LL manages both serial and parallel jobs over a cluster
of servers which consists of a pool of machines or
servers, often referred to as a LL cluster.
• Jobs are allocated to machines in the cluster by a
scheduler and the allocation of the jobs depends on
the availability of resources within the cluster and
various rules defined by the LL administrator.
• A user submits a job using a job command file which
contains details of the executable, it dependencies and
LL directives.
3/16/2009
Advanced User Education Programme SERC,IISc.
7
LL@SERC
• LL is installed on almost all IBM servers and
parallel machines hosted in SERC, which are,
– P690 or IBM-Regatta machines
– P575 machines
– P720 (256 node) IBM Linux cluster
– IBM Blue-Gene/L
– IBM –SP3
3/16/2009
Advanced User Education Programme SERC,IISc.
8
Sample LL job submission file.
#!/bin/sh
# @ error = job1.$(Host).$(Cluster).$(Process).err
# @ output = job1.$(Host).$(Cluster).$(Process).out
# @ input = inputfile
# @ class = p5task4
# @ job_type = parallel
# @ tasks_per_node =4
# @ notification = always
# @ requirements = (Arch == "R6000") && (OpSys == "AIX53")
# @ executable = /usr/bin/poe
# @ arguments = executable
# @ queue
3/16/2009
Advanced User Education Programme SERC,IISc.
9
Useful LL commands
• llq - Queries information about jobs in the
LoadLeveler queues
• llcancel <jobid> - Cancels one or more jobs
from the LoadLeveler queue.
• llclass - Returns information about classes
• llsubmit - Submits a job to LoadLeveler
• llstatus - Returns status information about
machines in the LoadLeveler
3/16/2009
Advanced User Education Programme SERC,IISc.
10
LL@P690&P575
• There are four logical P690 and two P575 machines that are
controlled by a single LL manager. All machines host the AIX OS.
• Three of the P690 (regatta1/2/3) accept parallel jobs and one
(regatta4) is for interactive use. Both P575 machines accept parallel
jobs.
• The machine regatta4 is the submission host for this cluster.
• Jobs on this cluster are restricted by job _time.
• Queue information for this cluster is:
Class
Wall_clock_limit Max Processor
p5task4
4:00:00
4
p5task8
16:00:00
8
p5gtask16 32:00:00
16 For Gaussian
p5task16 32:00:00
16
3/16/2009
Advanced User Education Programme SERC,IISc.
11
LL@P720 Cluster
• P720 is a linux cluster and accepts only parallel jobs.
• Jobs are controlled using one LL manager for this
cluster.
• Queue information on this cluster is,
Class Wall_clock_limit Max Processor TotTasks
ptask32
02:00:00
32
32
ptask128 1+08:00:00
128
200
ptask64 2+16:00:00
64
(A total of 200 mpi-tasks are shared between ptask128
and ptask64)
3/16/2009
Advanced User Education Programme SERC,IISc.
12
LL@BlueGene/L
• Each node on BlueGene consists of two processors and
LL can allocate these in two different ways:
– VN mode – both processors are allocated for computation.
(beneficial for compute intensive jobs)
– CO mode – one processor is allocated for computation and
another for communication. (beneficial for compute and
communication intensive jobs)
• On BlueGene the LL queues are divided into two
blocks, namely
– Big Block – Default processor allocation is VN mode
– Small Block – Default processor allocation is CO mode but
supports VN mode too.
3/16/2009
Advanced User Education Programme SERC,IISc.
13
LLQueues on BlueGene/L
Queue Wall_clock_limit No.of jobs No. of Nodes No. of MPI Tasks Allowed Modes
== ====================================================================
pnode 32
4:00:00
2
32
32 or 64
Both CO and VN
pnode32-24h 24:00:00
2
32
32 or 64
Both CO and VN
pnode128
16:00:00
2
128
128 or 256
Both CO and VN
pnode128-24h 24:00:00
1
128
128 or 256
Both CO and VN
pnode512
48:00:00
1
512
512 or 1024 Both CO and VN
pnode1024 120:00:00
4
512
1024
Only VN
pnode2048 60:00:00
2
1024
2048
Only VN
pnode4096 48:00:00
1
2048
4096
Only VN
=======================================================================
Small Block includes: pnode32, pnode32-24h,pnode128,pnode128-24h and pnode512.
Small block will have 2 midplanes. Supports both Co and VN mode
Big Block includes: pnode1024, pnode2048 and pnode4096. Big block will have six
midplanes. Supports only VN mode.
3/16/2009
Advanced User Education Programme SERC,IISc.
14
PBSPro@SERC
• PBSPro is the commercial version of
OpenPBS/torque, initially developed at NASA
labs, now sold by Altair.
• It is a flexible workload manager that can
schedule different jobs for different users on a set
of distributed heterogeneous systems.
• Has capabilities to define system/user/software
specific controls on jobs.
• Currently we are running PBSPro version 8.0.0.
3/16/2009
Advanced User Education Programme SERC,IISc.
15
PBSPro@SERC
• Available on all Linux based systems from
SUN, HP and SGI.
• Each PBSPro cluster typically manages a
homogeneous set of machines.
• Four clusters available at SERC, namely
– altix
– altix350-1
– altix350-2
– hplx
3/16/2009
Advanced User Education Programme SERC,IISc.
16
PBSPro@altix
• Consists of a single 32 CPU, SMP machine with hostname altix.
• Supports only 16 CPU parallel jobs.
• Jobs restricted by per processor CPU-time, number of jobs in
execution and number of jobs per user.
• Automatic job routing based on job script parameters.
• Queue parameters:
Queue
Memory CPUTime Walltime Node Run Que Lm State
---------------- ------ -------- -------- ---- ----- ----- ---- ----qp100
----2 12 2 E R
route_q
----0
0 -- E R
----- ----2 12
3/16/2009
Advanced User Education Programme SERC,IISc.
17
PBS queues on altix
Queue qp100
queue_type = Execution
Priority = 250
max_queuable = 50
total_jobs = 14
state_count = Transit:0 Queued:12 Held:0 Waiting:0 Running:2 Exiting:0 Begun:0
max_running = 2
from_route_only = True
resources_max.ncpus = 16
resources_max.pcput = 100:00:00
resources_min.ncpus = 16
resources_assigned.ncpus = 32
resources_assigned.nodect = 2
max_user_run = 1
enabled = True
started = True
Queue route_q
queue_type = Route
total_jobs = 0
state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0 Begun:0
route_destinations = qp100
enabled = True
started = True
3/16/2009
Advanced User Education Programme SERC,IISc.
18
Sample PBSPro job submission script
file
#!/bin/sh
#PBS -l ncpus=8
#PBS -l pcput=60:00:00
./job1
3/16/2009
Advanced User Education Programme SERC,IISc.
19
Useful PBS commands
• Job submission - qsub <job_script>
10374.altix
• Submitted job status – qstat <job_name> or
qstat –a
• Kill a running job – qdel <job_name>
• Detailed job status – tracejob <job_name>
(this command will work correctly only if executed on the
node where PBS server is running.)
• Current queue status – qstat –q
• Complete details of your running job –
qstat –f <job_name>
3/16/2009
Advanced User Education Programme SERC,IISc.
20
Common errors with PBS job scripts
• minncpus < ncpus < maxncpus
• pcput > maxpcput
– Job will not get submitted
• All PBS directives described in the user guide
may not work for an installation. This depends
on the configuration. If you want to use
something specific please check with your
system administrator.
3/16/2009
Advanced User Education Programme SERC,IISc.
21
PBSPro@altix350-1
•
•
•
•
•
Consists of a single 16CPU, SMP machine with hostname altix350-1.
Supports serial and 4/8 CPU parallel jobs.
Jobs restricted by per processor CPU-time, total job CPU-time, number of jobs in
execution and number of jobs per user.
Automatic job routing based on job script parameters.
Queue parameters:
Queue
Memory CPUTime Walltime Node Run Que Lm State
---------------- ------ -------- -------- ---- ----- ----- ---- ----route_q
----0
0
-- E R
qp_4_32
-128:00:0 --2
32
2 ER
qp_4_64
-256:00:0 --1
23
2 ER
qp_8_32
-256:00:0 --0
0
1 ER
qs_32
-32:00:0 --3
7
4 ER
----- ----6 62
•
Queue specific details can be found by executing the command
qmgr
qmgr> list queue qp_4_32
3/16/2009
Advanced User Education Programme SERC,IISc.
22
PBSPro@altix350-2
•
•
•
•
•
Consists of a single 16CPU, SMP machine with hostname altix350-2.
Supports 8 CPU parallel jobs.
Jobs restricted by per processor CPU-time, total job CPU-time, number of jobs in
execution and number of jobs per user.
Automatic job routing based on job script parameters.
Queue parameters:
Queue
Memory CPUTime Walltime Node Run Que Lm State
---------------- ------------- -------- -------- ----- ---- ----route_q
----0
0
-- E R
qp_8_64
-512:00:0 --0
1
2 ER
qp_8_100
-800:00:0 --2
34
2 ER
----- ----2
35
•
Queue specific details can be found by executing the command
qmgr
qmgr> list queue qp_8_64
3/16/2009
Advanced User Education Programme SERC,IISc.
23
PBSPro@hplx&sunlx
•
•
•
•
•
•
Consists of 18 nodes with 10 hplx and 8 sunlx machines. All machines loaded with 64-bit linux
OS. Server and Scheduler for this cluster is hplx1_2. Currently undergoing reconfiguration.
For details contact Mr. Chandrappa <chandru@serc.iisc.ernet.in>
Supports only serial jobs.
Jobs restricted by per processor CPU-time, total job CPU-time, number of jobs in execution
and number of jobs per user.
Automatic job routing based on job script parameters.
Queue parameters:
Queue
Memory CPUTime Walltime Node Run Que Lm State
---------------- ----------- ---------- ------- ---- ----- ----- ---- ---- ------qh64
-64:00:00 --2
0
24 D R
qh16
-16:00:00 --0
0
24 D R
route
----0
0
--- E R
qh8
-08:00:00 --0
0 24 E R
qh256
-256:00:0 --2
0 24 D R
qh32
-32:00:00 --0
0 24 D R
----- ----4
0
Queue specific details can be found by executing the command
qmgr
qmgr> list queue qh64
3/16/2009
Advanced User Education Programme SERC,IISc.
24
LSF@SERC
• Load Sharing Facility (or simply LSF) is a commercial
computer software job scheduler sold by Platform
Computing. It can be used to execute batch jobs on
networked Unix and Windows systems on many
different architectures.
• LSF version 4.1 is currently installed on the Compaq
ES40 machines (commonly known as alpha servers).
• In LSF there is no concept of job script. You can create a
shell script that contains details of your executable and
its dependencies and submit this as a job to LSF.
• You can also use the various job submission options to
specify the executable dependencies.
3/16/2009
Advanced User Education Programme SERC,IISc.
25
LSF@alphas4
• The alpha server cluster consists of 5 ES40 servers, each with 4 CPUs.
• The cluster allows only serial jobs and has alphas4 as the submission host.
All other machines are execution nodes.
• Queue configuration for the cluster:
QNAME PRIO STATUS
MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP
8hr
10 Open:Active 4 1
- 0
0
0
0
64hr
6 Open:Active 4 1
- 0
0
0
0
32hr
6 Open:Active 4 1
- 0
0
0
0
16hr
6 Open:Active 4 1
- 0
0
0
0
128hr
4 Open:Active 4 1
- 0
0
0
0
256hr
2 Open:Active 4 1
- 0
0
0
0
g98_q
1 Open:Active
1 1
- 0
0
0
0
unlimited 1 Open:Active 4 1
- 0
0
0
0
• Queue specific details can be found by executing the command
bqueues –l <queue_name>
3/16/2009
Advanced User Education Programme SERC,IISc.
26
Commonly used LSF commands
• bhosts - gives status of the current nodes in the
cluster.
• bsub <job_name> - command to submit you
executable to LSF.
• bqueues – gives details on configured queues.
• bjobs - gives details on the status of you jobs
submitted to LSF.
• bkill <job_id> - kills a submitted job
• xbsub – is the X based GUI for LSF job submission.
3/16/2009
Advanced User Education Programme SERC,IISc.
27
Using Batch Schedulers-General Guidelines
• Choosing appropriate machine.
– The physical resources on the machine should meet the resource
requirement of your job. Take some time and understand the profiling
of your job and resources it requires.
• Choosing appropriate job-queues.
– Most queues are configured to restrict the CPU time or the job time.
Ensure that the queue time is greater than your required time.
– Write your jobs such that it stores intermediate results that can be
used to restart the job in case of failure or termination.
• Managing job specific I/O files.
– Use common file systems like hpcscratch or utemp while reading or
writing files that are used by your batch job.
– Give complete file paths and use the file I/O programming APIs to do
I/O.
– Avoid using the shell I/O redirection, particularly on the distributed
clusters. Your job may fail since file staging is not implemented or
configured in any of the batch schedulers at SERC.
– Include temporary /input file cleanup and output file movement
commands as part of your job submission scripts.
3/16/2009
Advanced User Education Programme SERC,IISc.
28
When in doubt go to
http://www.serc.iisc.ernet.in/ComputingFacilities/software/
software.htm
Thankyou!
ANY QUESTIONS?
3/16/2009
Advanced User Education Programme SERC,IISc.
29