TRLAN is a program designed to find a small number of extreme eigenvalues and their corresponding eigenvectors of a real symmetric matrix. Denote the matrix as the eigenvalue as and the corresponding eigenvector as they are defined by the following equation,
There are a number of different implementations of the Lanczos algorithm available(1). Why another one? Our main motivation is to develop a specialized version that only target the case where one wants both eigenvalues and eigenvectors of a large real symmetric eigenvalue problems that can not use the shift-and-invert scheme. In this case the standard non-restarted Lanczos algorithm requires one to store a large number of Lanczos vectors which can cause storage problem and make each iteration of the method very expensive. The underlying algorithm of TRLAN is a dynamic thick-restart Lanczos algorithm. Like all restarted methods, the user can choose how many vectors can be generated at once. Typically, the user choose a moderate size so that all Lanczos vectors can be stored in core. This allows the restarted methods to execute efficiently. This implementation of the thick-restart Lanczos method also uses the latest restarting technique, it is very effective in reducing the time required to compute a desired solutions compared to similar restarted Lanczos schemes, e.g., ARPACK(2).
When solving most problems, the three most time-consuming procedures in
the Lanczos method are the matrix-vector multiplication,
re-orthogonalization and computation of the Ritz vectors. To make this
package as small as possible, we have delegated the task of performing
the matrix-vector multiplication to the user. This is a reasonable
approach because there is simply too many possible ways of performing
the operation and the user usually can construct a specialize version
that is better than a generic matrix-vector multiplication routine. In
addition, there are high quality matrix-vector multiplication routines
available as parts of larger packages, for example, P_SPARSLIB, AZTEC,
BLOCKSOLVE and PETSc, see section Operator interface, for details. To
reduce the amount of the time spent in re-orthogonalization, we only
perform re-orthogonalization if it is necessary. The Ritz vectors are
computed as during the restarting process, in TRLAN, we only compute
those that are determined to be needed. This reduces the number of Ritz
vectors computed. To compute them efficiently, we call the
library to perform the actual computation.
The program is implemented in
Fortran 90. The main advantages of
Fortran 90 compared to
Fortran 77 is that
Fortran 90 offers dynamic memory management which make it more
flexible in terms of allocating temporary work arrays. If there is an
array not used for other task, it can be passed into TRLAN, else the
user can simple let TRLAN allocate its own work arrays. TRLAN internal
allocate all the space it requires up front to avoid repeated call to
allocated small pieces of workspace.
Similar to other languages, such as C/C++,
Fortran 90 offers data
encapsulation which makes it convenient to pass a significant amount of
information cleanly. TRLAN packages a large amount of information in a
single object to reduce the size of the external user interface.
See section TRL_INFO module, for details. A significant advantage of using
Fortran compared to C/C++ is the ease of using computational
libraries such as
LAPACK. In fact, most
numerical computations of TRLAN are performed using those library
functions. Because most machines have vendor optimized
LAPACK, being able to effectively use them is crucial to the
effectiveness of TRLAN package.
Fortran 90 also provides some utility functions such as query
function for the machine precision, random number generator and timing
functions. They make the program more portable across different
Parts of this document contains details about the software package which
may not be of interest to every user. Here are some advice on how to
use this document. If you just want to get a feel of how TRLAN looks
like in a program, take a look at section A small example or the examples come
with the software package. Chapter 3 contains a short example that uses
mostly default parameters. To assert more control over TRLAN, see
section TRL_INFO module, and section Main function interfaces. If you are somewhat
puzzled about how to choose the parameters, see section Recommended parameter choices, for
our recommendations. If you don't use
Fortran 90, see section Calling from other languages, for what to do. For most users, there is no need to read
everything in section The elements of
TRL_INFO_T, and section TRLAN error code. If you read the
entire document and are still puzzled, contact the authors, see
section Contacting the authors.
The source code of the package is available at
This document is distributed with the package and is also separately available at
The package may be unpacked by
tar -xzf trlan.tar.gz
tar program does not recognize flag
you can unpack it in two steps
gunzip trlan.tar.gz tar -xf trlan.tar
After this, the files in the package will be unpacked into a directory called `TRLan'.
To install the package, you will need a
Fortran 90 compiler, the
LAPACK libraries. On parallel
machines, MPI is also needed. The compiler name and the options used
are specified in the file called `Make.inc'. A number of
examples are provided in the file for different machines. If your
compiler name and library locations are same as one of the examples, you
can uncomment the section, comment out the default values, and use the
settings. If your compiler has a different name or the libraries are
located at a different place, you will need to modify the file to refer
to their correct values. The package may be compiled into one of the
two library files
libtrlan_mpi.a where the former is the sequential
version of the package and the latter is the parallel version. To
generate them go to `TRLan' and type
make libtrlan.a or make libtrlan_mpi.a
To compile the examples, go to the appropriate subdirectory in
`examples'. If you are on a sequential or a shared memory
computer and there is no subdirectory that matches your computer, the
source code in the
SUN directory can be used. The examples
in `T3E' and
`psp' can be run on parallel machines that support MPI.
`Makefile' in the `T3E' and `psp' directories
are only tested on a Cray T3E. They will need modification in order to
be used elsewhere. The examples in directory `psp' also need a
special supporting library called P_SPARSLIB, read the
file `README' before try to use it.
There are three examples in most example directories (except
simplec. These are three programs should be doing the
same thing using three different languages. The executables can be
make simple simplec simple77
They should output the same eigenvalues, however the actual printout may differ slightly due floating-point round-off errors.
For further questions, consult the `README' files in the
directories. To report errors in the installation procedure or suggest
improvements, the authors can be reached
kwulbl.gov (Kesheng Wu)
hdsimonlbl.gov (Horst Simon).
This is a simple example in
Fortran 90. It is short because we
have used a very simple matrix and used default parameters wherever
possible. It uses MPI to handle data communication required by TRLAN.
This example comes with the distribution of the source code in directory
`examples/T3E'. The name of the file is `s1.f90' and on T3E
is can be compiled by
make s1 which generates the executable
!!! a really simple example of how to use TRLAN Program simple Use trl_info Use trl_interface Implicit None Include 'mpif.h' ! local variable declaration Integer, Parameter :: nrow=100, lohi=-1, ned=5, maxlan=40, mev=10 Double Precision :: eval(mev), evec(nrow, mev) Type(trl_info_t) :: info Integer :: i External diag_op ! name of the matrix-vector multiplication routine Call MPI_INIT(i) ! initialize MPI ! initialize info -- tell TRLAN to compute NED smallest eigenvalues Call trl_init_info(info, nrow, maxlan, lohi, ned) ! call TRLAN to compute the eigenvalues Call trlan(diag_op, info, nrow, mev, eval, evec, nrow) Call trl_print_info(info, nrow+nrow) If (info%my_pe .Eq. 0) Then write (6, FMT=100) (i, eval(i), i=1,info%nec) End If 100 Format('E(', I1, ') = ', 1PG25.17) Call MPI_finalize(i) End Program simple !!! ! a simple matrix-vector multiplications routine ! defines a diagonal matrix with value (1, 4, 9, 16, 25, 36, ...) !!! Subroutine diag_op(nrow, ncol, xin, ldx, yout, ldy) Implicit None Integer, Intent(in) :: nrow, ncol, ldx, ldy Double Precision, Dimension(ldx*ncol), Intent(in) :: xin Double Precision, Dimension(ldy*ncol), Intent(out) :: yout Include 'mpif.h' ! local variables Integer :: i, j, ioff, joff, doff Call MPI_COMM_RANK(MPI_COMM_WORLD, i, j) doff = nrow*i Do j = 1, ncol ioff = (j-1)*ldx joff = (j-1)*ldy Do i = 1, nrow yout(joff+i) = (doff+i)*(doff+i)*xin(ioff+i) End Do End Do End Subroutine diag_op
There are two parts in this example, the main program and the
matrix-vector multiplication subroutine. The main program sets up the
info variable to carry information to and from TRLAN, calls
TRLAN, and prints the information carried out in
info and the
eigenvalues computed. Here is a short explanation of the arguments to
call trl_init_info(info, ! the variable to be set nrow=100, ! there are 100 rows on each processor maxlan=40, ! maximum Lanczos basis size is 40 lohi=-1, ! compute the smallest eigenvalues ned=5) ! compute 5 eigenvalues
The calling sequence of TRLAN is fairly simple because all
the gory details are hidden inside
info. The following listing
describes the information required by TRLAN to solve an eigenvalue
call trlan(diag_op, ! matrix-vector multiplication routine info, ! what eigenvalues to compute, etc. nrow, ! 100 rows on this processor mev, ! number of eigenpairs can be stored in ! eval and evec eval, ! real(8) :: eval(mev) ! array to store eigenvalue evec, ! real(8) :: evec(lde,mev) ! array to store the eigenvectors lde) ! the leading dimension of evec
The content of
info and the eigenvalues are printed
separately. The content of
info is printed by calling
trl_print_info which accepts two arguments,
info to be
printed and the number of floating-point operations used for one
matrix-vector multiplication on this processor. The second parameter is
needed because the matrix-vector multiplication is user-supplied. The
information is used to compute the speed of the matrix-vector
multiplication and the speed of the whole program. It can be ignored,
in which case
trl_print_info will leave the related fields blank.
The short matrix-vector multiplication routine,
multiplication with a very simple matrix,
diag(1, 4, 9, ...).
The example tries to find 5 smallest eigenvalues of this matrix, 1, 4,
9, 16, 25. The following is the output from a run on a T3E/900 located
at the National Energy Research Supercomputing Center(3).
1998/09/24 18:37:16.834 (-07:00) TRLAN execution summary (exit status = 0 ) on PE 0 Number of SMALLEST eigenpairs: 6 (computed), 5 (wanted) Times the operator is applied: 847 (MAX: 2000 ) Problem size: 100 (PE: 0), 400 (Global) Convergence tolerance: 1.490E-08 (rel), 2.384E-03 (abs) Maximum basis size: 40 Restarting scheme: 0 Number of re-orthogonalizations 847 Number of (re)start loops: 36 Number of MPI processes: 4 Number of eigenpairs locked: 0 OP(MATVEC): 9.35440E-03 sec, 1.81091E+01 MFLOPS Re-Orthogonalization: 2.12016E-01 sec, 4.68742E+01 MFLOPS Restarting: 2.36795E-01 sec, 2.02369E+01 MFLOPS TRLAN on this PE: 5.29851E-01 sec, 3.00020E+01 MFLOPS -- Global summary -- Overall MATVEC Re-orth Restart Time(ave) 5.2985E-01 9.4129E-03 2.1143E-01 2.3685E-01 Rate(tot) 1.2001E+02 7.1990E+01 1.8801E+02 8.0930E+01 E(1) = 0.99999999997742750 E(2) = 3.9999999999816311 E(3) = 8.9999999999916049 E(4) = 16.000000000026944 E(5) = 25.000000000089663 E(6) = 36.000000000367905
In short, to use TRLAN to find some extreme eigenvalues, the user
defines a matrix-vector multiplication routine with the same interface
trl_init_info to specify what
eigenvalues to compute and calls
trlan to perform the bulk of the
computation. The remainder of this manual will explain the user
interface and how to control TRLAN in more detail. How to Write a
particular matrix-vector multiplication for an operator is beyond the
scope of this manual. Some packages containing distributed
matrix-vector multiplications routines are listed in section Operator interface.
The example in previous chapter uses two modules, TRL_INFO and
TRL_INTERFACE. As the name suggested, TRL_INTERFACE contains the user
interface for accessing TRLAN. The module TRL_INFO only contains the
definition of the
Fortran 90 derived
type TRL_INFO_T. To make it easy to access, we have
provided six access functions,
trl_terse_info. The first four are for manipulate input
parameters to TRLAN and the last two are for printing the content of
TRL_INFO_T. We will discuss these access functions in
this chapter. The remaining interface functions are described in the
next Chapter. The last two sections of this chapter may be skipped if
the reader is only seeking information on how to use the package.
This initialization routine is equivalent to a generator function in
C++. It is intended to be called before any other TRLAN
functions. Any parameter not explicitly set by the caller is set to its
default value and all internal counters are set to zero. Its
interface block is as follows,
Subroutine trl_init_info(info, nrow, mxlan, lohi, ned, tol,& & trestart, maxmv, mpicom) Use trl_info Integer, Intent(in) :: lohi, mxlan, ned, nrow Integer, Intent(in), Optional :: maxmv, mpicom, trestart Real(8), Intent(in), Optional :: tol Type(TRL_INFO_T), Intent(out) :: info End Subroutine trl_init_info
We have seen the mandatory arguments in the example, however, there are four optional arguments that were not used before. For completeness, we will give a short description of all arguments here.
Fortran 90derived type that will carry the information to the
trlansubroutine. It is set by this subroutine. Any prior content will be cleared.
nrowrefers to the number of rows located on the current processor. It may vary from processor to processor.
trlan. The restarted Lanczos algorithm will store up to
maxlanLanczos vectors and one (1) residual vector. An additional memory of size
maxlan*(maxlan+10)is required to perform the Rayleigh-Ritz projection to compute the approximate solutions. Generally, the larger
maxlanis, the fewer matrix-vector multiplications are needed. See section Selecting the maximum basis size (
maxlan), for further discussion on this parameter.
lohi< 0), or the largest ones (
lohi> 0), or whatever converges first (
ned, are mandatory when calling
trl_init_info. The following parameters are optional because a reasonable value can be determined by
ntotthe global problem size.
trlan. This parameter is only meaningful if MPI is used. If the sequential version is used, this variable is simply ignored internally. If MPI is used and this variable is not set,
MPI_COMM_WORLDand use the resulting communicator for its internal communication operations.
There are cases we would like to monitor the progress of the restarted
Lanczos algorithm. We can do this by setting a few logging parameters.
Since the debug information may be voluminous,
them to files. Each MPI process will write its own debug information to
a separate file. The name of the file and how much debug information to
write is controlled by calling
Subroutine trl_set_debug(info, msglvl, iou, file) Use trl_info implicit none Type(TRL_INFO_T), intent(inout) :: info integer, intent(in) :: msglvl, iou Character*(*), Optional :: file End Subroutine trl_set_debug
TRL_INFO_Ttype variable to be modified. The function
trl_init_infoshould have been called before calling
trl_init_infosets it to zero as the default value.
trlanis being used.
Trl_init_infosets it to 99 as the default value.
trl_set_debug. When it is not set, the corresponding element of
TRL_INFO_Tis not changed.
Trl_init_infosets this elements to `TRL_LOG_' by default.
TRLAN program can either use a user-supplied initial guess, generate
an arbitrary initial guess or read a set of checkpoint file to get
starting vectors. The thick-restart Lanczos may start with arbitrary
number of vectors, however, the starting vectors have to satisfy a
strict relation. The simplest way to start the algorithm is to simply
provide one starting vector. If the function
not called, the starting vector is set to [1, ..., 1] by default. The
checkpoint option is implemented to enable a user to continue improve
the accuracy of the solutions progressively.
Subroutine trl_set_iguess(info, nec, iguess, oldcpf) Use trl_info Implicit None Type(TRL_INFO_T) :: info Integer, Intent(in) :: iguess, nec Character(*), Intent(in), Optional :: oldcpf End Subroutine trl_set_iguess
TRL_INFO_Ttype variable to be modified. The function
trl_init_trlshould have been called before calling
necis greater than zero (0), the first
neccolumns of array
evecshould contain eigenvectors of the operator and the first
evalshould contain the corresponding eigenvalues. This is designed to allow the user to return to
trlanto compute more eigenvalues and eigenvectors.
Trl_init_infosets this value to zero to indicate no converged eigenvalues.
iguessis less than zero, a random perturbations will be added to this vector before it is taken as the starting vector.
trl_init_infofor this is `TRL_CHECKPOINT_'.
NOTE: Reading the checkpoint files are done through I/O unit
Trl_init_info sets this value to 98 by default. If
I/O unit is used for another task already, use
cpio to an unused I/O unit number.
TRLAN has implemented a scheme of checkpointing to allow the user to
stop and restart. The checkpoint files of the thick-restart Lanczos
algorithm contains all the information necessary for it to continue the
Lanczos iterations. To minimize the size of the files, the checkpoint
files are written at the end of the restart process because the basis is
the smallest in size at this point. For efficiency reasons, each MPI
processor writes its own checkpoint file in
form. These checkpoint files can only be read on the same type of
machines and using the same number of MPI processors. The function
trl_set_iguess controls whether the checkpoint files are read.
The following function controls when to write the checkpoint files.
Subroutine trl_set_checkpoint(info, cpflag, cpio, file) Use trl_info Implicit None Type(TRL_INFO_T) :: info Integer, Intent(in) :: cpflag, cpio Character(*), Optional :: file End Subroutine trl_set_checkpoint
TRL_INFO_Ttype variable to be modified. The function
trl_init_trlshould have been called before calling
cpflagset of checkpointing files in
maxmviterations. If it is less or equal to zero, no checkpoint file is written. Checkpointing files are only written if TRLAN runs correctly. If
cpflagis greater than zero, at least one set of checkpoint files is written when TRLAN completes successfully. To debug the program, turn on verbose printing by using
Trl_init_infosets this value to zero.
FORTRANI/O unit number to be used for writing checkpoint files. The value of
cpiois set to 98 by default (
TRL_INFO_Tis not modified.
Trl_init_infosets this variable to `TRL_CHECKPOINT_' by default.
Upon returning from
trlan, the user may wish to exam the progress
trlan. One simple way to do this is to printout the content
info. There are two printing functions
trl_print_info is the one
that printed the results in section A small example. The following is the same
information printed using
NOTE: The eigenvalues are not stored in
info. The following
printout is from a different run of the same example, there is slight
difference in time.
MAXLAN: 40, Restart: 0, NED: - 5, NEC: 6 MATVEC: 847, Reorth: 847, Nloop: 36, Nlocked: 0 Ttotal:0.535910, T_op:0.008962, Torth:0.209496, Tstart:0.238200
interface of the two subroutines are as
Subroutine trl_print_info(info, mvop) Use trl_info Implicit None Type(TRL_INFO_T), Intent(in) :: info Integer, Intent(in) :: mvop End Subroutine trl_print_info
Subroutine trl_terse_info(info, iou) Use trl_info Implicit None Type(TRL_INFO_T), Intent(in) :: info Integer, Intent(in) :: iou End Subroutine trl_terse_info
TRL_INFO_Tvariable to be printed. In addition of keeping track of how many eigenvalues have converged and how many are wanted. There are significant amount of information about how many matrix-vector multiplications have been used, how much time is used in various parts of the program, and so forth. The verbose version of the printing function will printout most of the recorded information that are deemed to be useful. The terse version only printout the 12 most important fields.
trl_print_infoto determine the speed of the matrix-vector multiplication and the speed of the overall eigenvalue program. If this information is not present, the relevant fields are left blank in the printout. The variable
infocontains timing information and floating-point operations performed inside the
trlan. Since the matrix-vector multiplication is a user-supplied function, the user has to provide information about its complexity.
trl_terse_infois allowed to print to any valid
FORTRANI/O unit. This is different from the verbose printing function where the printout is always sent to the I/O unit that is used for logging debugging information.
In addition to the differences mentioned already, the function
trl_print_info requires every processor to participate but
trl_terse_info can be called by each processor individually.
Because of this reason,
trl_print_info can provide global
performance information but not
The verbose version of the printout is designed to be self-explanatory.
The rate fields for floating-point operations are in MFLOPS. The rate
Write refers to the speed of reading and
writing checkpoint files, they are in MegaBytes per second. Since the
simple example shown does not use checkpointing, no information
Write is presented in the printout.
The heading for the terse version of the printout is bit cryptic. They are
0to indicate which end of the spectrum is being computed.
|| A ||). Because the residual norms are so small, we lock the Ritz pairs to reduce the among of arithmetic operations needed in Rayleigh-Ritz projection.
In some instances, it might be necessary to directly access the status
info rather than print out the information.
trlan terminated because of some kind of error, the
two elements of
TRL_INFO_T that are most like to be useful
after returning from
nec, where the first one is the error flag
trlan and the second one indicates how many eigenvalues
and eigenvectors have converged.
The input parameters to TRLAN have appeared in the calling sequence
trl_set_checkpoint are described earlier in this chapter. The
following are elements of
TRL_INFO_T are counters set by
trlan. To the user, they are part of the output from TRLAN.
trlanhas reached the maximum size and restarted.
trlanhas generated random vectors in attempt to produce an vector that is orthogonal to the current basis vectors.
|| A ||). The accuracies of these eigenpairs can not be further improve by more Rayleigh-Ritz projection, therefore they are locked to reduce arithmetic operations, they are only used to perform re-orthogonalization but nothing else. TRLAN locks not only the wanted eigenpairs with small residual norm, it also locks unwanted ones depending the restarting strategy. The rational is that if they converge quickly, it is not good idea to throw them away because they will reappear in the next basis built.
Fortran 90intrinsic function
clk_tot tick_t clk_op tick_o clk_orth tick_h clk_res tick_r
tick_o), re-orthogonalization (
tick_h), restarting (
tick_r), and the whole
tick_t). The four integer counters,
clk_tot, are output from
Fortran 90intrinsic function
system_clock. Since it is likely the integer counters will overflow in some cases, each of them has a floating-point counterpart. If they become larger than a quarter of the maximum counter value
clk_max, they are added to the floating-point counters and reset to zero.
flop rflp flop_h rflp_h flop_r rflp_r
rflpare for counting the total floating pointer operations excluding those used by matrix-vector multiplications.
rflp_hare for counting the re-orthogonalization procedure.
rflp_rcount operations used in restarting. The integer counters are used initially until they become larger than
clk_max/4. Once they become too large, their values are added to the corresponding floating-point counter and they are reset to zero. Since the matrix-vector multiplication routine is supplied by the user, TRLAN can not account for the floating-point operations used in that procedure.
crat tmv tres trgt
cratis the convergence factor. It is measured after
tmvnumber of matrix-vector multiplications are used. The value of the target at
tres. The convergence factor is updated as follows. Among the Ritz values, search for the one that is closest to
trgt. This Ritz value is regarded as the updated version of the previous target Ritz value. Let
resbe the residual norm of the Ritz value, the convergence factor is computed as
res / tres) / (
matvec - tmv)). After
cratis updated, the current target value is identified as the first Ritz value that is not converged yet. The value of
tmvand the corresponding residual norm
tol, all the Ritz pairs with residual norm less than
tol * anrmare considered converged.
This section lists all error numbers defined in TRLAN and discusses possible remedies to the errors. Currently (TRLAN version 1.0), defines the follow error numbers,
trlanhas not computed all wanted eigenpairs, check the value of
necto see exactly how many wanted eigenpairs have converged. In case you have not computed all wanted eigenpairs, the possible solutions are:
trl_set_checkpointto generate checkpoint files for future use.
maxlan, See section Selecting the maximum basis size (
maxlan) for more details.
maxmv, See section Selecting the maximum iterations for more details.
nloc) does not match the value of
nrowused when calling
trlan. Most likely the user has used the
infovariable defined for a different problem. SOLUTION: Make sure
trl_init_infois called before
trlanand the arguments to both functions are correct for the intended eigenvalue problem.
evecis smaller than local problem size,
lde < nrow. There isn't enough space in
evecto store the eigenvectors correctly. SOLUTION: Allocate the array
evecwith leading dimension larger or equal to
evalis too small to store the eigenvalues,
mev < info%ned. There isn't enough columns in
eveceither. SOLUTION: Increase the size of array
evaland increase the number of columns in
maxlan * (maxlan + 10). TRLAN tries to allocate its own workspace if the user has not provided enough workspace to store the Lanczos basis vectors and the projection matrix, et al. SOLUTION:
maxlan. This will decrease the among of workspace required.
base) size required here is
(maxlan + 1 - mev) * nrow. SOLUTION: See solutions for error code
trlanczoswith insufficient workspace. SOLUTION: Increase workspace provided.
trlanczoswith insufficient workspace. SOLUTION: Increase workspace provided.
TRLANdoes not have enough workspace. This should not happen unless
trl_orthis directly called outside of TRLAN with insufficient workspace. SOLUTION: Increase workspace provided.
1E160, this error code should not be generated. If it is, normally it is an indication of other errors. For example, the workspace given by the user is not as large as the user indicated to
trlan, the array
evecis actually smaller than
(lde, mev), the first
evecare not orthonormal vectors on input, or you have encounter a bug in TRLAN or one of the libraries used by TRLAN. SOLUTION:
trlanis as large as claimed, i.e., the actual size of
wrkis at least as large as
wrkis present but not
lwrk, the actual size of
wrkshould be no less than
trlanis not going to compute all the eigenvectors because
trlanuses the space in
evecto store the Lanczos vectors during its computations.
necis not zero, make sure that the known eigenvectors are stored in the first
evecand the eigenvalues in the first
ssytrdhas failed. This is extremely unlikely to happen. SOLUTION: Check to make sure you have a correct version of the
LAPACK. If your
LAPACKis installed correctly, see suggestions for error
sorgtrhas failed. This is extremely unlikely to happen. SOLUTION: See solutions to error
trlanis no less than
lwrk. SOLUTION: See solutions to error
ssteinhas failed. Normally, if this happens, TRLAN will switch to a different method of computing the eigenvectors. It is very unlikely the user will see this error flag. If it does show up, it might be an indication of error in the program. SOLUTION: See solutions to error
ssyevhas failed to compute the eigenvalues and eigenvectors of the projection matrix. SOLUTION: Check to make sure
LAPACKis installed correctly. See solution to error
ssyev. SOLUTION: See solutions to error
trl_cgsdirectly, make sure workspace size
lwrkmatches the actual size of
FORTRAN 90random number generator to set each processor with a different seed value. For example, the following code segment will cause
random_numberto generate different random numbers on each processor the next time it is used and it also produces an random vector as initial guess for the Lanczos iterations.
call random_number(evec(1:nrow,1)) do i = 1, info%my_pe call random_number(evec(1:nrow,1)) end doIf reseting the seed of the random number generator does not fix the problem, then there might be a more serious problem. See also solutions to error
-2unless the checkpoint files are not for the same problem or not produced with the same number of processors. SOLUTION: Make sure the checkpoint files are generated on the same type of machines and for the same problem using the same number of processors.
cpiois not used for something else already.
maxlan. SOLUTION: Increase the size of
kLanczos vectors to be written out, the number of bytes is
8*(2+2*k+nrow*(k+1))for each processor. The maximum value for
This list represents all error code defined in TRLAN version
1.0. The function
trlan should never return an
error code not listed here. If you encountered one and you are sure
that all the arguments to TRLAN functions are correct, you have
uncovered a flaw in TRLAN, contact the authors with a description of
your problem. A short example is always welcome.
The majority of the arithmetic computations to solve an eigenvalue problem are executed in the main function of TRLAN package and the user's own matrix-vector multiplication function. This section gives a more detail description of the interfaces of these two functions.
The main computation kernel of TRLAN package is named
have see a short description of its arguments in section A small example.
To provide a different view of the interface, we show its
Subroutine trlan(op, info, nrow, mev, eval, evec, lde, wrk, lwrk) Use trl_info Implicit None Type(TRL_INFO_T) :: info Integer, Intent(in) :: lde, mev, nrow Integer, Intent(in), Optional :: lwrk Double Precision, Intent(inout) :: eval(mev), evec(lde,mev) Double Precision, Target, Dimension(:), Optional :: wrk Interface Subroutine OP(nrow, ncol, xin, ldx, yout, ldy) Integer, Intent(in) :: nrow, ncol, ldx, ldy Double Precision, Dimension(ldx*ncol), Intent(in) :: xin Double Precision, Dimension(ldy*ncol), Intent(out) :: yout End Subroutine OP End Interface End Subroutine trlan
Most of the arguments of this subroutine are explained before in section A small example. However for completeness, we will list all of them here.
xinand stores the resulting vectors in
yout. In linear algebra terms, this is a matrix-vector multiplication routine. The vectors to be multiplied are stored in
xinand the resulting vectors are stored in
Fortran 90derived type
TRL_INFO_T. It carries the information to and from TRLAN. See section TRL_INFO module, for more details.
evaland the number of columns in array
evec. It denotes the maximum number of eigenpairs that can be stored in
evec. NOTE: Since the array
evecwill be used internal to store
mevLanczos vectors, even if you do not think TRLAN is able to compute
meveigenvectors at the end, you still declare
evecas large as (
eval) and the eigenvectors (
evec). On entry to
info%necis greater than zero (0), the first
evalshall contain the eigenvalues already known and the first
evecshall contain the corresponding eigenvectors. These eigenpairs are assumed to have zero residual norms and will not be modified by TRLAN. On exit from
trlan, the converged solutions are stored in the front of
evec, i.e. the first
evalcontains the converged eigenvalues and the first
evecare the corresponding eigenvectors.
evec. It is expected to be no less than
nrow, otherwise the eigenvectors can not be stored properly and
trlanwill abort with error code
info%stat = -2.
wrkwill be used as workspace inside and
lwrkshall be the number of elements in array
wrk. TRLAN will try to use this workspace if it is large enough for either
maxlan * (maxlan + 10)) or
(maxlan + 1 - mev) * nrow). If
wrkis present but not
lwrk, the workspace size is assumed to be
mev. It does not make sense to have only
wrk. If it is the case,
lwrkis ignored. If argument
wrkis present and there is enough space to store the residual norms of the solutions, the first
wrkwill contain the residual norms corresponding to the
TRLAN program require the user to provide his/her own matrix-vector multiplication routine. The matrix-vector multiplication routine needs to have the following interface.
Subroutine OP(nrow, ncol, xin, ldx, yout, ldy) Integer, Intent(in) :: nrow, ncol, ldx, ldy Double Precision, Dimension(ldx*ncol), Intent(in) :: xin Double Precision, Dimension(ldy*ncol), Intent(out) :: yout End Subroutine OP
yout) to be multiplied.
Double Precision, Dimension(1:ldx,1:ncol), Intent(in)::xin Real*8 xin(1:ldx, 1:ncol)The
ith column of
xin((i-1)*ldx+1 : (i-1)*ldx+nrow)if
xinis declared as one-dimensional array. If the user routine actually declare it as a two-dimensional array, the
ith column should be
xin(1:nrow, i). TRLAN calls OP using Fortran 77 style argument matching, only starting address of
xinwill be passed. For those who are familiar with
xinis actually passed as
double *that points to the first element of array. Elements in a column are ordered consecutively and the
ith column starts at
xinwhen it is declared as two-dimensional array.
Double Precision,Dimension(1:ldy,1:ncol),Intent(out)::yout Real*8 yout(1:ldy, 1:ncol)The usage notes on
xinalso apply to
youtwhen it is declared as two-dimensional array.
This simple interface only has enough information to describe the input
and output vectors. Here are some possible ways of passing the matrix
information to this subroutine. In
Fortran 90, we recommend
using a module to encapsulate information related to the matrix. If
Fortran 77 is used, a
common block may be used for the
same purpose. Normally, if another language like
is used, the matrix can be packaged in a
struct or a
class, and accessed through a global variable.
In case the user does not want to write his/her own matrix-vector multiplication routine. There are a number of packages out there that can be used. Useful software depots and information archives include
National HPCC Software Exchange
Scientific Application on Linux
Potentially useful packages include
NOTE: All of the packages mentioned above have matrix-vector multiplications routines. However some of them are designed for solving linear systems or even larger granularity tasks, some effort may be required to directly using their matrix-vector multiplication routines.
trlan, the user needs to decide a few parameters.
The most important parameters are arguments to function
trl_init_info. The parameters like
ned are determined by the problem to be solved, other parameters
to control the execution of TRLAN might not be familiar to casual users.
This part of the manual will give some recommendations on how to
determine those parameters.
A few factors come into play when picking the maximum basis size
maxlan, for example, the available computer memory size, the
number of eigenvalues wanted, and the separation of wanted eigenvalues
from the others. The first rule of thumb is that
at least as large as
ned + min(6, ned).
Generally, the larger it is, the better TRLAN will perform. The limitation on using a very large basis is that there might not be enough computer memory to store the basis in memory. Another concern regarding using a large basis is that the Gram-Schmidt orthogonalization process will be expensive. In addition, if the basis size is larger than 1000, then the time spent in finding the eigenvectors of the projection matrix may be a substantial portion of the overall execution time.
If the wanted eigenvalues are easier to compute compared to others, then
it does not matter how large the basis size is, the restarted Lanczos
method will find the solutions fairly quickly. If the wanted
eigenvalues converge slower than the unwanted ones, such as the example
in section A small example, then the above recommended minimum size is too small
to be effective. In this case, the user should look at how many
eigenvalues were locked and compare it with the number of eigenvalues
converged. In difficult case, it is not unusually to see a large number
of unwanted eigenpairs converge before the wanted one are finally
computed. In the previous example, the minimum recommended basis size
is 11. Since it is relatively small and we know the eigenvalue problem
is relatively hard, we first tried
maxlan = 20. After 2,000
matrix-vector multiplications, there are two wanted eigenvalues
converged, and six eigenvalues were locked. After this first test, we
use the following guidelines to choose the next basis size.
maxlanfor each locked eigenvalue.
ned / nec.
The first rule suggests the new basis size of about 30 and the second suggest the next choice could be 50. The basis size used in the example is 40. We are able to find the five smallest eigenvalues with this choice. Further tests show that using basis size of 30 can compute the same 5 eigenvalues in 1056 matrix-vector multiplications, and using a basis size of 50 TRLAN only need 777 matrix-vector multiplications. However, in both cases, more time was used. This demonstrates the complexity of the choice. In this particular case, either 30, 40 or 50 is a reasonable choice.
The convergence test used in this program is
r < tol * || A ||.
Normally, if the matrix is stored, the accuracy of the matrix-vector
multiplication routine is on the order of
epsilon * || A ||. The
unit round-off error (
epsilon) of a 64-bit IEEE floating-point
number is approximately
2.2E-16. The default value of
1.49E-8. Typically, if 5 digits of accuracy is desired
for the eigenvectors,
tol should be set to
This is another parameter that can change the execution time dramatically. However, effective restarting schemes are still subject of active academic researches. On the example given before, schemes 1 and 2 uses about the same amount of matrix-vector multiplications which are more than the number of matrix-vector multiplications used with schemes 3 and 4. However, because schemes 3 and 4 perform more restarts and they save more basis vectors during restarting, their restarting procedures are more expensive. The actual execution time with schemes 3 and 4 are longer than those with schemes 1 and 2. Based on these observations, schemes 3 and 4 are better if the matrix-vector multiplication is very time-consuming, say, one matrix-vector multiplication takes more time than an average restart. If the matrix-vector multiplication is relatively inexpensive, then schemes 1 and 2 are preferred. Scheme 5 attempts to mimic the restarting strategy in ARPACK, in many cases, it has comparable performance as the scheme 1.
TRLAN is stopped usually after it has found all the wanted eigenvalues
and the corresponding eigenvectors. The other normal stopping condition
is to stop after
maxmv number of matrix-vector multiplications.
Typically, we would allow a fixed number of matrix-vector
multiplications for each eigenvalue, for example, 100 per eigenvalue.
When we are trying to find the correct value to use for
restart we may limit the number of matrix-vector
multiplications used to reduce the time consumed. The default value in
trl_init_info is very large especially for large problems. An
more acceptable limit might be 1000 matrix-vector multiplications per
eigenvalue. This should be sufficient for most problems. If more than
1000 matrix-vector multiplications are used to compute one eigenvalue,
other means of computing eigenvalues should be tried. For example, the
shift-and-invert Lanczos method is often able to compute the desired
eigenvector in a few steps. The shift-and-invert scheme computes the
extreme eigenvalues of
first, then derive the actual eigenvalues of
To use this scheme, one need to either invert the matrix or at least
being able to solve linear systems,
If neither is feasible, then the Davidson method might be an alternative
If TRLAN does not return with status 0, consult section Setting debug parameters, to setup a debugging session, and refer to section TRLAN error code, for error encountered and possible solutions.
Some of the issues related to workspace requirements have been mentioned through out this manual. This section provide a central location to collect all the information for ease of reference.
trlan, there are three large chucks of workspace,
misc. The user always provides the
evec, since it is necessary to carry input and output
information for TRLAN. Its size is clearly defined in the calling
mev. The array
base is used to
store the basis vectors if the array
evec can not store
maxlan+1 vectors. Given the maximum basis size
the size of
(maxlan + 1 - mev) * nrow). The array
misc is used to store the projection matrix, the eigenvalues and
eigenvectors of the projection matrix, workspace required by all lower
level routines of TRLAN, library routines from
BLAS. Its size should be no less than
maxlan * (maxlan +
10). If it is larger in size, some library routines might run faster.
Thus if there is large amount of computer memory, the user can let TRLAN
use it by pass in a large array
If the user provides a workspace
trlan, then its
size is checked to see either one of
base or both
of them can fit inside the workspace. If at lease one of them can fit
wrk, it would be used. If
wrk is large enough for
misc, the array
base will use
(maxlan + 1 - mev) * nrow) elements and the rest is given to
trlan can not use
wrk, it will allocate
workspace of appropriate size internally.
wrk is provided, its content is not used on input. However
trlan will copy the residual norms of the
converged Ritz pairs in the first
nec elements of
We have isolated the communication needs of TRLAN in four subroutines,
trl_g_dot. The four subroutines are located in a file called
`trl_comm_mpi.f90' for the MPI version and `trl_comm_none.f90'
for sequential version. If the the matrix-vector multiplication routine
and the main program are written for sequential machine, the user can
simply compile with `trl_comm_none.f90' to get the sequential
version of the program. This setup makes it easy to adopt TRLAN for
different types of eigenvalue problems.
trl_init_info is used to initialize the
TRL_INFO_T type variable to be used by
trl_sync_flag is used to synchronize the status flags
used inside TRLAN. In the current implementation, it computes the
minimum value of
info%stat on each processor and reset all
info%stat to the minimum value. Given that the error flags are
all less than zero (0), if any processor has detected an error, all of
them will be set to indicate an error.
Trl_g_sum computes the
global sum of an input array and it returns the global sum in the same
array. The subroutine
trl_g_dot computes the dot-products among
the Lanczos vectors.
If desired, one can change these four routines to suit different
situations. For example, if the physical domain of the eigenvalue
problem has certain symmetry, usually the discretization does not
contain the whole domain but only a portion of it. Since not every
element of a vector is stored, the dot-product routine needs to be
modified. In this case, only
to be modified in order for TRLAN for function properly.
TRLAN program is implemented in
Fortran 90. Since
90 is backward compatible with previous versions of
There should be no problem to use it in any other
program. However, at the moment, the authors are not aware of a scheme
to reliably access
Fortran 90 subroutine with optional arguments,
a subroutine with fixed arguments is created to get around this problem.
The fixed arguments subroutine has the following interface,
subroutine trlan77(op, ipar, nrow, mev, eval, evec, lde, & wrk, lwrk) integer ipar(32), nrow, mev, lde, lwrk double precision eval(mev), evec(lde, mev), wrk(lwrk) external op
Fortran 90 derived type
is removed from this user interface since its primary access function
trl_init_info contains optional arguments as well. Here is a
list showing how the integer array
ipar is mapped to the elements
ipar(1) = stat,
ipar(2) = lohi,
ipar(3) = ned,
ipar(4) = nec,
ipar(5) = maxlan,
ipar(6) = restart,
ipar(7) = maxmv,
ipar(8) = mpicom,
ipar(9) = verbose,
ipar(10) = log_io,
ipar(11) = iguess,
ipar(12) = cpflag,
ipar(13) = cpio,
ipar(14) = mvop,
ipar(24) = locked,
ipar(25) = matvec,
ipar(26) = nloop,
ipar(27) = north,
ipar(28) = nrand,
ipar(29) = total time in milliseconds,
ipar(30) = MATVEC time in milliseconds.
ipar(31) = re-orthogonalization time in milliseconds.
ipar(32) = restarting time in milliseconds.
Among the parameters,
ipar(2 : 14) are input parameters,
ipar(24 : 32) are output
There are two floating-point number elements in
crat. Before calling
trlan77, the first
wrk should be set to the residual tolerance
wrk(1) is transfered to
tol. On return from
trlan77, the first
wrk store the residual norms corresponding to the
converged eigenvalues and eigenvectors. Element
wrk will store the last known value of
Caution: Fail to set
wrk(1) to a valid floating-point
number will cause TRLAN to produce floating-point exceptions!
trlan77 is relative simple to call from
C/C++. The file `simplec.c' in the distribution of TRLAN
version 1.0 has equivalent functionalities as
`simple.f90' and `simple.f77'.
Here are a few suggestions on what to watch out for when using TRLAN. Some of suggestions are simply good programming practices.
Fortranprograms. This is very effective in catching typos. It is also a good practice to check the programs with automated tools such as
lwrkis actually the size of array
trlanmatches arguments to
trl_set_debugprior to calling
trlan. Set the ninth and tenth element of array
iparto appropriate values when
trlan77is used. Consult section Setting debug parameters to setup a debugging session. Refer to section TRLAN error code for error encountered and possible solutions.
The authors of TRLAN and this document can be contacted at the following
kwulbl.gov (Kesheng Wu),
hdsimonlbl.gov (Horst Simon). Kesheng Wu can also be reached
kwucomputer.org. The authors also
maintain their own research pages on the web at
http://www.nersc.gov/~simon. The updated software package
can also be found at both web addresses.
This is a list books and research papers on the Lanczos algorithm and restarting. See section Operator interface, for a list of software archives.