Local documentation
EsiRS
EsiRS was (or rather the two initial computers, Rossini and Schubert, were) acquired to meet the needs of scientists at the University of Montreal working on the numerical simulation of large problems. Problems being attacked are diverse, ranging from the hydrodynamic modelling of stars to the quantum mechanical simulation of complex problems in biochemistry and surface catalysis. Funds for the purchase of these machines were provided half by Computer Services at the University of Montreal and half through the funding of a major equipment grant from the Natural Sciences and Engineering Research Council (NSERC) of Canada.
Rossini and Schubert were installed in 1996. Additional memory was added in June of 1997 thanks to a second grant from the Natural Sciences and Engineering Research Council (NSERC) of Canada. At that time, each machine consisted of an 8 CPU SGI R10000 PowerChallenge and each currently has 2 GB of memory. The two computers were connected by a HiPPI link.
In December of 1998, Rossini and Schubert were unified into a single machine, EsiRS.ESI , with 16 CPUs and 4 GB of memory.
Policy
EsiRS is intended as a batch machine for heavy duty number crunching, ultimately using parallelized code. For this reason, users must submit their jobs through the installed Network Queueing System (NQS) software. Access to EsiRS is restricted to those who have obtained permission from the controlling committee. Network Queuing System (NQS)
Interactive and background jobs are strongly discouraged on EsiRS. Instead jobs should be submitted in batch mode via NQS. The construction of NQS scripts and use of NQS commands is described on a separate page, can a description of our local NQS queues.
Configuration
EsiRS is part of the ESI network which involves several different batch machines. Files systems from the different batch machines are shared by remote mounting. A system of softlinks creates “universal” (i.e. machine-independent) names for these remote mounted file systems which, when correctly used, means that NQS scripts for one machine will run correctly with little if any modification on other ESI batch machines. These universal names are described on a separate page.
Since it is important that jobs run on the batch machines maximize performance and since performance is lost by transfering files across disk boundaries, it is important to follow a standard operating procedure.
Compiling for parallel execution
When compiling code modules for parallel execution on EsiRS, the following minimum options are necessary, in both C and FORTRAN: -mp -mips4. Top performance will usually result only after serious fine tuning, which may may require the use of many extra options. In particular, the user should make sure that his/her code is already performing well in scalar mode before attempting to use parallelization. Have a good look at the various -WK options for this purpose; for example, IEEE compliant arithmetics can be very costly, and the option -WK,-r=3 can be used to turn it off (other equivalent forms for this option are -OPT:IEEE_arithmetic=3 or -OPT:roundoff=3 ).
The compile options -pfa of f77 and -pca of cc invoke a preprocessor which can do a bit of fine grain parallelization automatically. It is not essential in the initial stages of parallelizing a serial program but can be useful. Since these options can be costly, they are only recommended as part of an initial analysis. The code produced by the preprocessors may be saved and thus form the base for further work in parallelizing and optimizing the code.
Documentation
There exists an excellent textbook describing the language, tools and libraries needed for parallelizing on SGI computers: it is Practical Parallel Programming, by Barr E. Bauer, Academic Press, 1992. ISBN 0-12-082810-3
The official documentation on current SGI parallel FORTRAN or parallel C languages and libraries is available online with the help of SGI’s documentation browser insight.
To use insight, set your DISPLAY environment variable (on Schubert or Rossini) to the address of your diplay (e.g., the internet address of your workstation followed by :0 or :0.0), then run insight. For this to work, your workstation must allow the machine running insight to write to your terminal; under the X11 windowing system, you may have to execute the following command: xhost + > /dev/null on your workstation to allow it.
FORTRAN programmers should checkout (with insight) Chapter 5 of The FORTRAN Programmer’s Guide. Documentation about pfa may be found in the “book” POWER Fortran Accelerator User’s Guide. C programmers should check out the “book” IRIS Power C User’s Guide.
Known Bugs
A bug had been found in an earlier version of the fortran compiler in the functions for handling complex numbers. This has now been fixed.
Message Passing Interface (MPI)
We have developed some experience with mpi, SGI’s proprietary implementation of MPI. The MPIview program allows the user to follow interactively the execution of programs using mpi. For additional information,
- The MPI Forum is an open group with representatives from many organizations that define and maintain the MPI standard.
- MPI: The Complete Reference, an on-line book about MPI.
- Mathematics and Computer Science Division of the Argonne National Lab page on MPI
- LAM/MPI, vendor for an environment for parallel computing across a heterogeneous network.
Note also that MPI users are now required to use the “-miser” option in conjunction with mpirun, whenever submitting to NQS. This forces the use of standard UNIX forking mechanisms (as opposed to SGI’s array services), which is necessary for proper NQS accounting.