Install TORQUE 3.0.2 on an SGI Altix UV100 running Suse 11 SP1. The UV100 is a shared memory architecture consisting of a number of 3U chassis containing 2 blades. These chassis can be daisy-chained together using SGI’s Numalink. Even though you can combine up to 24 chassis with hundreds of cores and untold gobs of RAM, it appears to the operating system as one machine. Torque’s latest release includes numa support which will allow Torque to see virtual nodes and make better scheduling decisions.
Download source to server and extract the tarball.
Install gcc and any other prerequisites.
The default install is in /usr/local so you have to add /usr/local/lib to the LD_LIBARY_PATH. Add the path to /etc/ld.so.conf and run ldconfig.
./configure –enable-numa-support --enable-numa-mem-monitor make make install |
To set up server with numa_nodes you have to create a mom.layout file to describe the virtual layout of the nodes. The documentation is decent, but it didn’t do a great job explaining the correlation between the cpus and memory layout. Looking at the contents of my /sys/devices/system/node shows four nodes:
Homestar:/sys/devices/system/node # ls has_cpu has_normal_memory node0 node1 node2 node3 online possible |
And a closer look at a node itself:
Homestar:/sys/devices/system/node # ls node0 cpu0 cpu2 cpu4 cpu41 cpu43 cpu45 cpu47 cpu49 cpu6 cpu8 cpulist distance memory0 memory11 memory13 memory15 memory3 memory4 memory6 memory8 numastat cpu1 cpu3 cpu40 cpu42 cpu44 cpu46 cpu48 cpu5 cpu7 cpu9 cpumap meminfo memory10 memory12 memory14 memory2 memory32 memory5 memory7 memory9 scan_unevictable_pages |
As you can see, I have two distinct groups of CPU sets. I wasn’t quite sure how to map them in the mom.layout, but eventually the working config I put together looks like:
Homestar:~ # cat /var/spool/torque/mom_priv/mom.layout cpus=0-9,40-49 mem=0 cpus=10-19,50-59 mem=1 cpus=20-29,60-69 mem=2 cpus=30-39,70-79 mem=3 |
Then add the following to /var/spool/torque/server_priv/nodes to tell Torque that we are using 4 numa nodes with 80 cores:
Homestar:~ # cat /var/spool/torque/server_priv/nodes ## This is the TORQUE server "nodes" file. Homestar np=80 num_numa_nodes=4 |
From the install directory, copy the init scripts for the server, sched and mom. Then insert the service. The scripts had to be modified. The daemon path needs to be changed and in my version, there was a booger where they forgot to comment out a fi.
cp contrib/init.d/suse.pbs_mom /etc/init.d/pbs_mom insserv -d pbs_mom |
Populate the server config from the install dir:
./torque.setup root |
It gave me an error about a bad acl_host so I changed the server name to localhost in qmgr.
Qmgr: set server acl_hosts = localhost |
Restart the services and run a test job.
service pbs_server restart service pbs_sched restart service pbs_mom restart |
Homestar:~ # qstat -q server: Homestar Queue Memory CPU Time Walltime Node Run Que Lm State ---------------- ------ -------- -------- ---- --- --- -- ----- batch -- -- -- -- 0 0 -- E R ----- ----- |
Homestar:~ # qmgr -c 'p s' # # Create queues and set their attributes. # # # Create and define queue batch # create queue batch set queue batch queue_type = Execution set queue batch resources_default.nodes = 1 set queue batch resources_default.walltime = 01:00:00 set queue batch enabled = True set queue batch started = True # # Set server attributes. # set server scheduling = True set server acl_hosts = Homestar set server acl_hosts += localhost set server default_queue = batch set server log_events = 511 set server mail_from = adm set server scheduler_iteration = 600 set server node_check_rate = 150 set server tcp_timeout = 6 set server mom_job_sync = True set server keep_completed = 300 |
Homestar:~ # pbsnodes -a Homestar-0 state = free np = 20 ntype = cluster status = rectime=1310429883,varattr=,jobs=,state=free,netload=? 0,gres=,loadave=? 0,ncpus=10,physmem=33534316kb,availmem=32477340kb,totmem=33534316kb,idletime=18715,nusers=0,nsessions=0,uname=Linux Homestar 2.6.32.29-0.3.1.2687.3.PTF-default #1 SMP 2011-02-25 13:36:59 +0100 x86_64,opsys=linux mom_service_port = 15002 mom_manager_port = 15003 gpus = 0 Homestar-1 state = free np = 20 ntype = cluster status = rectime=1310429914,varattr=,jobs=,state=free,netload=? 0,gres=,loadave=? 0,ncpus=10,physmem=33554432kb,availmem=32903268kb,totmem=33554432kb,idletime=18715,nusers=0,nsessions=0,uname=Linux Homestar 2.6.32.29-0.3.1.2687.3.PTF-default #1 SMP 2011-02-25 13:36:59 +0100 x86_64,opsys=linux mom_service_port = 15002 mom_manager_port = 15003 gpus = 0 Homestar-2 state = free np = 20 ntype = cluster status = rectime=1310429914,varattr=,jobs=,state=free,netload=? 0,gres=,loadave=? 0,ncpus=10,physmem=33554432kb,availmem=32890392kb,totmem=33554432kb,idletime=18715,nusers=0,nsessions=0,uname=Linux Homestar 2.6.32.29-0.3.1.2687.3.PTF-default #1 SMP 2011-02-25 13:36:59 +0100 x86_64,opsys=linux mom_service_port = 15002 mom_manager_port = 15003 gpus = 0 Homestar-3 state = free np = 20 ntype = cluster status = rectime=1310429914,varattr=,jobs=,state=free,netload=? 0,gres=,loadave=? 0,ncpus=10,physmem=33538048kb,availmem=32892028kb,totmem=33538048kb,idletime=18715,nusers=0,nsessions=0,uname=Linux Homestar 2.6.32.29-0.3.1.2687.3.PTF-default #1 SMP 2011-02-25 13:36:59 +0100 x86_64,opsys=linux mom_service_port = 15002 mom_manager_port = 15003 gpus = 0 |
powell@Homestar:~> echo "sleep 30" | qsub 1.homestar.area1.arete.com powell@Homestar:~> qstat Job id Name User Time Use S Queue ------------------------- ---------------- --------------- -------- - ----- 1.homestar STDIN powell 0 R batch 0 0 |
How did you get server_priv/nodes file to work correctly?
pbsnodes: Server has no node list MSG=node list is empty – check ‘server_priv/nodes’ file
/nodes
test np=32 num_node_board=4