Hardware
Below is a summary of the Planck cluster hardware. Interprocessor communication is over a Mellanox 4X Infiniband connection. There are three 24-port IB switches, each with 6 connections to each of the other switches and 11 connections to compute nodes. The I/O subsystem consists of a separate IB network between the nodes. This IB network also has three 24-port IB switches, and is additionally connected to 4 gateway nodes which have fiber channel connections to the NERSC Global Filesystem (NGF).
| Planck Cluster Hardware Specifications | |
|---|---|
| Number of compute nodes | 32 |
| Processor cores per node | 8 |
| Number of compute processor cores | 256 |
| Processor Core type | Opteron 2350 2.0GHz Quad Core |
| Physical memory per compute node | 32 GB |
| Number of login nodes | 1 |
| Communication Interconnect | Mellanox 4x Infiniband |
| File System | Separate 4x Infiniband network connected to NGF through 4 gateway nodes |
| Batch system | Torque/Maui |
User Environment
Your default shell on planck.nersc.gov is controlled just like other NERSC machines. You can log in to NIM and change the default shell. Note that the machine is called "PDSF" in the NIM interface. See this page for more instructions. The files .bash_profile, .bashrc, .cshrc, .kshenv, .login, .profile, and .tcshrc are links to read-only files, and should not be deleted. All individual customizations (aliases, environment variables, etc.) should be made in the files named .bashrc.ext, .cshrc.ext, .kshenv.ext, .login.ext, .profile.ext, and .tcshrc.ext. These .ext files are sourced by the corresponding dot-files.
Each user has a relatively small home directory. The primary disk space for all software development and data processing is the NERSC Global Filesystem (NGF), which is mounted in the usual place at /project/projectdirs. The home directories and /project are visible from all the compute nodes.
Software
Using and running software on planck.nersc.gov is very similar to other NERSC machines. Here we try to summarize the differences. The Planck cluster at NERSC has several different "programming environments", which consist of different serial and MPI compiler toolchains. The Pathscale environment uses the Pathscale serial compilers, a version of MVAPICH2 built on top of these compilers, and a pathscale-specific version of ACML for accelerated math, BLAS/LaPACK, and FFT functionality. The GNU environment uses gcc (4.3.x) serial compilers and compatible versions of MVAPICH2 and ACML. Before loading the cmb module, you must first decide which programming environment you want to use. The default is the GNU environment:
%> module avail PrgEnv PrgEnv/gnu-1.0(default) PrgEnv/pathscale-1.0
%> module load PrgEnv
- Python (2.5.2)
- Expat (2.0.1)
- Boost (1.36.0)
- GNU Science Library (1.11)
- CDF (3.2.1)
- NetCDF (4.0)
- NAG C Library (Mark8)
- Qt (3.3.8b and 4.4.1)
- KDE (3.5.10)
- Kst (1.8.x)
%> module load cmb
This will load the cmb module set that has been compiled using the previously loaded programming environment.
Using the PBS Scheduler
When working on the compute nodes (both interactively and with batch jobs) there are a number of options that control the environment in which your applications run. You should actively think about what software you are running, what you are trying to do with the software, how much memory you need, etc. Then you can tailor your environment to the requirements of the task at hand.
| Common PBS Options/Directives | ||
|---|---|---|
| Option | Default | Description |
| -l nodes=N:ppn=P,pvmem=Mgb | nodes=1:ppn=1,pvmem=4gb | Use P processors per node across N nodes with M Gigabytes of Memory per processor. Your job will die if you request more than a total of 32GB per node. Valid entries for this option would be nodes=X:ppn=4,pvmem=8gb nodes=X:ppn=2,pvmem=16gb nodes=X:ppn=1,pvmem=32gb |
| -l walltime=HH:MM:SS | Maximum for Queue (see table) | Limit the job wall clock time to HH hours, MM minutes, and SS seconds. |
| -e filename | <script_name>.e<job_id> | Write STDERR to filename |
| -o filename | <script_name>.o<job_id> | Write STDOUT to filename |
| -j [eo|oe] | Do not merge. | Merge STDOUT and STDERR. If eo merge as standard error; if oe merge as standard output. |
| -m [a|b|e|n] | n | E-mail notification options: a = send mail when job aborted by system b = send mail when job begins e = send mail when job ends n = do not send mail Options a,b,e may be combined. |
| -N job_name | Job script name. | Job Name: up to 15 printable, non-whitespace characters. |
| -q queue | batch | See Batch queues below. |
| -S shell | Login shell | Specify shell as the scripting language to use. |
| -V | Do not import. | Export the current environment variables into the batch job enviroment. |
Interactive Work
You should ALWAYS use the compute nodes for doing any tasks which are cpu or memory intensive. The login node should only be used for light work such as editting files, compiling software, etc. To use the compute nodes interactively (i.e. launch a shell on those nodes), you use the "-I" option to qsub.
Example
To run IDL (which is a serial program) on one of the compute nodes and use all 32GB of memory, one would do:
%> module load cmb %> qsub -I -V -l nodes=1:ppn=1,pvmem=32gb (note the "-V" option to propogate the cmb module environment to my new shell on the compute node) %> cd $PBS_O_WORKDIR (change back to the directory where I launched qsub) %> idl
If the qsub command above is something that you use frequently and is annoying to type, I suggest creating a shell alias for that command.
Example
To run kst (serial program) on one of the compute nodes and use all 32GB of memory, one would do:
%> module load cmb kst %> qsub -I -V -l nodes=1:ppn=1,pvmem=32gb (note the "-V" option to propogate the cmb module environment to my new shell on the compute node) %> cd $PBS_O_WORKDIR (change back to the directory where I launched qsub) %> kstclean (the kstclean alias pipes garbage errors to /dev/null)
Batch Jobs
Running jobs on planck.nersc.gov is very similar to running jobs on jacquard.nersc.gov. The only difference is that planck has 8 processor cores per node, and 32 available nodes. In both the GNU and Pathscale compiler environments, you should use the "mpiexec" command in your PBS script to launch jobs. You can also use the "-V" PBS keyword to propogate your shell environment (including which modules are loaded) to the compute nodes. You can submit batch jobs to one of the three available queues (interactive, debug, batch), and it will run with the priority and limits in the table below.
| Submit Queue |
Exec Queue |
Nodes | Max Wallclock | Max Jobs per user | Relative Priority |
|---|---|---|---|---|---|
| interactive | interactive | 1-4 | 1 hour | Currently Unlimited | 1 |
| debug | debug | 1-8 | 1 hour | Currently Unlimited | 2 |
| batch | batch16 | 1-16 | 48 hours | Currently Unlimited | 4 |
| batch32 | 17-32 | 24 hours | Currently Unlimited | 3 |
Example
Here is a MADmap example. After loading the desired PrgEnv module and the cmb module, the MADmap executable should be in your $PATH (you can verify this by typing "which MADmap"). To run MADmap on 64 cores (8 nodes with 8 cores each), one could submit the following script to run a job in the standard batch queue:
#PBS -S /bin/bash #PBS -l nodes=8:ppn=8,pvmem=4gb #PBS -l walltime=1:00:00 #PBS -N madmap_job #PBS -q batch #PBS -o madmap.log #PBS -j oe #PBS -V cd $PBS_O_WORKDIR mpiexec MADmap -r runconfig.xml -l
Example
Here is an example using mpiBatch to run 8 instances of IDL in batch-mode on 4 nodes (2 processes per node) with each process accessing 16GB of memory. The first step is to create a text file for each process containing a list of IDL commands (not a program definition). Something like this:
%> cat task1.idl print,'IDL running task1'
Now we make a text file containing the commands to run on each process. In this case, we are having IDL execute the batch file for each of the 8 tasks:
%> cat idl_tasks idl -e @task1.idl idl -e @task2.idl idl -e @task3.idl idl -e @task4.idl idl -e @task5.idl idl -e @task6.idl idl -e @task7.idl idl -e @task8.idl
And finally we create the necessary PBS script which calls mpiBatch with this task list:
#PBS -S /bin/bash #PBS -l nodes=4:ppn=2,pvmem=16gb #PBS -l walltime=1:00:00 #PBS -N idl_job #PBS -q batch #PBS -o idl_job.log #PBS -j oe #PBS -V cd $PBS_O_WORKDIR mpiexec mpiBatch idl_tasks