Global matrix storage, and equation solvers

The size of the band is defined by the **bandwidth** NBW. This defined as the smallest
number for which: K_{i,j}=0 for all i,j satisfying |i-j| > {NBW} .
If K is symmetric, we will store the lower half of the band, i.e. all entries K_{i,j}
for which i gt;= j . The **halfbandwidth** NHBW is defined by NBW = 2*NHBW + 1.
Since we only get a nonzero entry in K_{i,j} if nodes with global numbering i and j
are contained in the same element, the halfbandwidth NHBW can be calculated as the maximum
over all elements l , of the difference in node numbers for the nodes in l .
In POISS.FOR, this calculation is done at the start of subroutine `assmb`. We will
therefore use NPOIN rows of GSTIF (one row for each node in the mesh) and NHBW+1 columns.
We denote the latter value by NBAND, and dimension GSTIF at the start of POISS.FOR as

DIMENSION GSTIF(MPOIN,MBAND)

where the parameters MPOIN and MBAND are set in the PARAMETER statement.

Then, on row i we store only k_{i,i- {NHBW}},...,k_{i,i} , for i=1,... ,NTOTV. We store these entries in an array GSTIF with NTOTV rows (where NTOTV is the total number of degrees of freedom) and (NHBW+1) columns. The mapping from K to GSTIF is: n K_{i,j} --> {GSTIF(I,J-I+NHBW+1)} for j=i- {NHBW},...,i .

Then the diagonal of the matrix (entries where i=j ) will be stored in the final column of
the array. You can see the entries of the element stiffness matrices being placed in GSTIF,
in subroutine `assmb` of `POISS.FOR`.

The mapping is slightly more complicated in **ELAST.FOR**, where there are two degrees of
freedom per node. Here, the u and v d.o.f.s of node n (i.e. displacements in the x
and y directions at node n ) are global degrees of freedom numbers i_n+1 and i_n+2 ,
where i_n=2(n-1) . You can see this mapping being used at the start of subroutine `assmb`.
After making this adjustment, the local d.o.f.s are mapped to GSTIF by the formula in the
previous section. At the start of `assmb`, the halfbandwidth is found by finding the
position of the first non-zero entry in row I: this is stored in the array NMINVC. Thus,
if NMINVC(I)=MI, it means that k_{i,mi} \neq 0 but k_{i,j}=0 {for} j=1,...,mi-1 .
Then the halfbandwidth NHBW is given by \max_{i=1, {NTOTV}}(i-mi) .

It will be seen that the size of the bandwidth, and hence the amount of storage required,
will depend on the order in which the nodes of the mesh have been numbered. For this reason
it is very important to use the element renumbering facility in PREFEL when producing
the `.dat` file for input to the `main engine', to obtain a sensible numbering of the
elements --- and then to re-number the nodes accordingly. Note that the bandwidth does not
depend on the element numbering directly.

The mapping from K to GSTIF is again n K_{i,j} --> {GSTIF(I,J-I+NHBW+1)}. Now, however, there are NBW columns, and the diagonal entries of K lie down the (NHBW+1)'th column.

This storage technique is used in the soil consolidation program **CONSL.FOR**. As with band
storage, the amount of storage used depends on the numbering of the nodes in the mesh. We
again use the array NMINVC(I) to tell us the column where the first non-zero entry on
row I of K lies. Then (as K will by symmetric in CONSL) we store only the entries from
this point up to the diagonal entry, in GSTIF. GSTIF now becomes a one-dimensional array,
and we use a pointer array LPOINT(I) to tell us where the I'th row of K begins. That is,
the diagonal entry K_{i,i} will be stored as GSTIF(LPI), where LPI=LPOINT(I),
for I=1 to NTOTV. The dimension of GSTIF is then LPOINT(NTOTV), which we denote by NSTIF.
The parameter MSTIF is used to dimension GSTIF at the start of the program.
The arrays NMINVC and LPOINT are set up by a subroutine `mask` in CONSL.FOR.
LPOINT(I) can be found from LPOINT(I-1) by

LPOINT(I) = LPOINT(I-1) + I - NMINVC(I) + 1

since there are I-NMINVC(I)+1 entries to store on row I. We start with LPOINT(1)=1.

The mapping from the lower diagonal of K to GSTIF is now
n
K_{i,j} --> {GSTIF(LPOINT(I) - I + J)}
if NMINVC(I) lt;= J lt;= I. This mapping is performed by a function `gpos`; if
K_{i,j} lies outside the envelope of K which is being stored, `gpos` returns a value
of zero. Otherwise,
n
K_{i,j} --> {GSTIF(GPOS(LPOINT,NTOTV,I,J))}.

This algorithm is represented in the following fragments of pseudo-code. first we show the reduction phase, in which the lower triangular matrix L is formed:

C...Reduction: do 20 i=1,n-1 pivot = G(i,i) do 10 k=i+1,n fact = G(k,i)/pivot do 5 j=i+1,n G(k,j) = G(k,j) - fact*G(i,j) 5 continue f(k) = f(k) - fact*f(i) 10 continue 20 continueHere, G(i,j) indicates the (i,j)'th entry of G; no modification has been made to represent the compact storage of G. If this pseudo-code is compared with the code in subroutine

The solution is completed by the back-substitution phase, performed in subroutine `backsu`
of THERM.FOR:

c...back-substitution phase u(n) = f(n)/G(n,n) do 10 i=n-1,1, step -1 pivot = G(i,i) do 5 j=i+1,n f(i) = f(i) - G(i,j)*u(j) 5 continue u(i) = f(i)/pivot 10 continue

The entries in L must be found systematically. The first one which can be found is l_{1,1} , since it will be seen by forming the matrix LL^T and comparing it with K , that k_{1,1} = {l_{1,1}}^2 . Once l_{1,1} is known, we could calculate all the entries below this in the first column. However, we will work in a row-based manner, and find l_{2,1} , use this to find l_{2,2} , before moving on to the 3rd row to find l_{3,1},l_{3,2},l_{3,3} --- then move on to the 4th row, and so on. This row-based approach matches the row-based skyline storage algorithm described earlier.

This Choleski decomposition algorithm is used in the Basic-level programs POISS and ELAST,
and can be found in subroutine `chodec`. The pseudo-code for this
decomposition, applied to the matrix G, is as follows:

c T c...decompose G into L.L i = 1 aii = G(1,1) clii = sqrt(aii) G(1,1) = clii c...decompose each row in turn do 20 i=2,n c...work along the row do 10 j=1,i-1 cljj = G(j,j) aij = G(i,j) sum = 0.0 do 5 k=1,j-1 gik = G(i,k) gjk = G(j,k) sum = sum + gik*gjk 5 continue clij = (aij - sum)/cljj G(i,j) = clij 10 continue c...finally find the diagonal entry on the row aii = G(i,i) sum = 0.0 do 15 k=1,i-1 clik = G(i,k) sum = sum + clik*clik 15 continue clii2 = aii - sum clii = sqrt(clii2) G(i,i) = clii 20 continueNote that as each entry in L is found, it is written to G, so that the corresponding entry of the stiffness matrix gets overwritten. At the end of the algorithm, G contains the entries of L in its lower triangle.

In implementing this algorithm in a `main engine', we must use a mapping to convert G(i,j) to a storage space in GSTIF, as described in the earlier section on storage techniques.

The system Gu=f can now be rewritten LL^Tu=f , and is solved in two stages:

- solve Ly=f using forward substitution;
- solve L^Tu=y using back-substitution.

c...forward substitution: solve Ly = f y(1) = f(1)/G(1,1) do 10 i=2,n bi = f(i) sum = 0.0 do 5 k=1,i-1 gik = G(i,k) ak = y(k) sum = sum + gik*ak 5 continue clii = G(i,i) y(i) = (bi - sum)/clii 10 continue c T c...back-substitution phase: solve L x = y u(n) = y(n)/G(n,n) do 20 i=n-1,1, step -1 yi = y(i) sum = 0.0 do 15 k=i+1,n gki = G(k,i) ak = u(k) sum = sum + gki*ak 15 continue clii = G(i,i) u(i) = (yi - sum)/clii 20 continueIn program CONSL.FOR, we use the same solution algorithm, but modify its Fortran coding to take account of the skyline storage technique, described above. When row-based skyline storage is used with a Choleski decomposition, the algorithm given above for the final back-substitution phase (solve L^Tu=y ) is rather inefficient. This is because it works with rows of L^T , which are columns of L , the entries of which are scattered throughout the row-based storage array GSTIF. It can easily be modified, however, so that once an entry u_i is found, this is eliminated from all equations from i-1 up to row 1, which involves working up the column of L^T . The modified algorithm, replacing the last phase of the previous pseudo-code, is:

c...back-substitution phase: solve L x = y c...We can do this in a column-based manner! do 20 i=n,1, step -1 yi = y(i) clii = G(i,i) xi = yi/clii u(i) = xi c...once we have found x_i, transfer the term involving this, to c...the r.h.s. in all earlier equations if (i.eq.1) goto 20 do 15 j=i-1,1, step -1 clij = G(i,j) y(j) = y(j) - clij*xi 15 continue 20 continueThis form of the routine is also used in PLADV, where skyline storage is also employed.

The coding for this algorithm, for the case of a symmetric matrix, can be seen in subroutine `front` of the advanced elasticity program `ELADV.FOR`. I am indebted to Dr David
Naylor for permission to use this subroutine. A number of global one-dimensional arrays have
to be used, but the algorithm is still more space-efficient than skyline storage, and so `ELADV` should be used for elasticity analyses with complex meshes. There are comment
lines in the subroutine, giving some guidance as to how the algorithm works.

The algorithm can be adapted to the case of a non-symmetric matrix G , in which case
approximately twice as much storage will be needed. My own adaptation of `front` for the
nonsymmetric case, is the subroutine `afront` in the elasto-viscoplasticity program `VPLAS.FOR`. This program handles non-associated plastic flow, which results in
non-symmetric stiffness matrices.

The amount of storage needed depends on the numbering of the elements, not the nodes, as the
algorithm is dependent on the order in which the elements are assembled into the front. The
subroutine makes a preliminary run-through of the elements to check that sufficient storage
is provided; this is defined by the parameter MFRON at the start of the program. The entries
of G are allocated spaces in the one-dimensional array GSTIF, which is of dimension
MFRON. As the assembly process is handled within `front`, there is no `assmb`
subroutine within `ELADV` and `VPLAS`.

an
{Initialization}

v^{(0)} = 0

r^{(0)} = f

k = 0

z^{(0)} = C^{-1}r^{(0)}

p^{(0)} = z^{(0)}

{Begin iteration}

\alpha_k = \frac{(z^{(k)},r^{(k)})}{(Gp^{(k)},p^{(k)}}

v^{(k+1)} = v^{(k)} + \alpha_k p^{(k)}

r^{(k+1)} = r^{(k)} - \alpha_k Gp^{(k)}

z^{(k+1)} = C^{-1} r^{(k+1)}

ta_k = \frac{(z^{(k+1)},r^{(k+1)})}{(z^{(k)},r^{(k)}}

p^{(k+1)} = z^{(k+1)} + ta_k p^{(k)}

k = k + 1

{Next iteration}
a

This form of the PCG algorithm is taken from Bartelt(1989) -- see Chapter 10 for details pf recommended texts. Here, (u,v) denotes the inner product of vectors u and v . The matrix C is the preconditioning matrix. The algorithm is terminated when the norm of the residual vector r^{(k+1)} is less than TOLER times the norm of f ; then the solution u=v^{(k+1)} .

The simplest form of preconditioning is **diagonal preconditioning**, in which
C = {diag}(G) . This algorithm is implemented in the Basic-level program FRAME.FOR.
The solution is performed in subroutine `pcgsol`, the pseudo-code for which is:

c...solve Kv=g using diagk as preconditioner c...multiplications K.v are performed element-by-element, c...reading the element matrices of K from the scratch file c c...initial guess; do 2 i=1,n v(i) = 0.0 r(i) = g(i) z(i) = 0.0 2 continue c c...z = {Kdiag}^-1 * r do 15 i=1,n z(i) = r(i)/diagk(i) 15 continue c do 16 i=1,n p(i) = z(i) 16 continue ztr = 0.0 do 17 i=1,n zi = z(i) ri = r(i) ztr = ztr + zi*ri 17 continue call norm2(mtotv,ntotv,r,rnorm0) itscg = 0 write(6,*)'PGSOL: iteration',itscg,',norm = ',rnorm0 c 25 itscg = itscg + 1 c c...create y which is G*p call ematgv( p , y ) pty = 0.0 do 28 i=1,n pty = pty + p(i)*y(i) 28 continue alpha = ztr/pty do 30 i=1,n v(i) = v(i) + alpha*p(i) 30 continue c do 40 i=1,n r(i) = r(i) - alpha*y(i) 40 continue c call norm2(mtotv,ntotv,r,rnorm) c write(6,*)' iteration',itscg,',norm = ',rnorm c if(rnorm.lt.cgtol) goto 200 c c...z = {Kdiag}^-1 * r do 45 i=1,n z(i) = r(i)/diagk(i) 45 continue c ztr0 = ztr ztr = 0.0 do 60 i=1,ntotv ztr = ztr + z(i)*r(i) 60 continue beta = ztr/ztr0 c do 130 i=1,ntotv p(i) = z(i) + beta*p(i) 130 continue c goto 25 c 200 continue write(6,*)'Solution converged after ',itscg,' iterations.'With the diagonal preconditioner (stored in the array

Diagonal preconditioning works surprisingly well, considering its simplicity. If a more
sophisticated preconditioner is required, the most popular currently is to perform an **Incomplete Choleski** (IC) factorization of G . This is essentially a symmetric Choleski
decomposition of G into LL^T , as described earlier, but avoiding some or all of the
`fill-in' which occurs within the envelope of G . That is, we may set L(i,j)=0
instead of L(i,j)=l_{i,j} if G(i,j)=0 . To compensate this, the calculated l_{i,j}
is instead added to the diagonal term L(i,i) .

Both diagonal and IC preconditioning are implemented in the advanced plasticity program
PLADV.FOR. The form of IC preconditioning there is adaptive: the fill-in is avoided if
the magnitude of l_{i,j} is less than a user-specified multiple ( `filtol`) of the
diagonal term.
PLADV.FOR uses the skyline storage technique for G .
The subroutine `inchol` in PLADV.FOR performs this IC decomposition. The pseudo-code
below should be compared with that for subroutine `chodec`:

c...Incomplete Choleski decomposition small = 1.e-8 c...count the amount of fill-in which occurs kfill = 0 khole = 0 i = 1 a11 = G(1,1) cl11 = sqrt(a11) G(1,1) = cl11 c...decompose each row in turn do 20 i=2,n aii = G(i,i) c...we will not fill in if the term is smaller than filtol filtol = factor*abs(aii) c...work along the row do 10 j=1,i-1 cljj = G(j,j) aij = G(i,j) sum = 0.0 do 5 k=1,j-1 sum = sum + G(i,k)*G(j,k) 5 continue clij = (aij - sum)/cljj c...here is the 'Incomplete' part gij = G(i,j) if (gij.eq.0.0) then khole = khole + 1 if (abs(clij).lt.filtol) then c...avoid fill-in: add i,j term to diagonal instead G(i,i) = G(i,i) + clij goto 10 endif c...fill-in will occur kfill = kfill + 1 endif G(i,j) = clij 10 continue c...finally find the diagonal entry on the row aii = G(i,i) sum = 0.0 do 15 k=1,i-1 clik = gstif(i,k) if (clik.eq.0.0) goto 15 sum = sum + clik*clik 15 continue clii2 = aii - sum clii = sqrt(clii2) G(i,i) = clii 20 continue write(6,*) 'No. of holes found in G matrix = ',khole write(6,*) 'No. of holes filled-in = ',kfillAs might be expected, there is a trade-off between the amount of fill-in which is prevented, and the effectiveness of the resulting preconditioner. The example datafile

Note that this is a very simple form of IC algorithm. There have been many improved algorithms proposed in recent years, and users wishing to apply this technique in practice are advised to search specialist journals for the lastest developments.

The reader will have noticed that additional tolerance parameters must be specified when an iterative solver is used. In PLADV, the iterative solvers are chosen by setting the parameter ISOLA to 2 or 3:

- ISOLA = 0 for direct solution using Choleski ( L.L^T ) decomposition and skyline storage;
- ISOLA = 1 for direct solution using L.D.L^T decomposition and skyline storage;
- ISOLA = 2 for iterative solution using conjugate gradients with diagonal preconditioning;
- ISOLA = 3 for Incomplete Choleski preconditioning with conjugate gradient solution.

The alternative method for imposing nodal fixities, used in ELAST.FOR, is sometimes called
a `big spring' approach. To impose the condition that u_k \approx 0.0 for some k , we
add a large number ( 10^{20} ) to the corresponding diagonal term in GSTIF; this is done
in subroutine `fixdof`, called by `solve` before calling `chodec`. This will
force the value of u_k arising from the solution process to be very small, and we
later set such small values to zero.

Fixed values of u_k which are non-zero (e.g. specified displacements, or the specified
values of the potential for nodes lying on Dirichlet boundaries) are dealt with similarly.
If we want to ensure that u_k = g , we multiply K_{k,k} by 10^{12} , and set the
corresponding entry in the r.h.s. vector as F_k = g*K_{k,k} . This process can be seen
in `fixdof` in POISS.FOR.

When a displacement degree of freedom is fixed, there will be a reaction force in this
direction at the node, to maintain equilibrium. This can be found after the main
solution, by calculating the r.h.s. for the corresponding row. This is done in
subroutine `reactn` in ELAST.FOR and FRAME.FOR; these reactions are written into
the load vector ASLOD, which is written in the result file.