The size of the band is defined by the bandwidth NBW. This defined as the smallest
number for which: K_{i,j}=0 for all i,j satisfying |i-j| > {NBW} .
If K is symmetric, we will store the lower half of the band, i.e. all entries K_{i,j}
for which i gt;= j . The halfbandwidth NHBW is defined by NBW = 2*NHBW + 1.
Since we only get a nonzero entry in K_{i,j} if nodes with global numbering i and j
are contained in the same element, the halfbandwidth NHBW can be calculated as the maximum
over all elements l , of the difference in node numbers for the nodes in l .
In POISS.FOR, this calculation is done at the start of subroutine assmb. We will
therefore use NPOIN rows of GSTIF (one row for each node in the mesh) and NHBW+1 columns.
We denote the latter value by NBAND, and dimension GSTIF at the start of POISS.FOR as
DIMENSION GSTIF(MPOIN,MBAND)
where the parameters MPOIN and MBAND are set in the PARAMETER statement.
Then, on row i we store only k_{i,i- {NHBW}},...,k_{i,i} , for i=1,... ,NTOTV. We store these entries in an array GSTIF with NTOTV rows (where NTOTV is the total number of degrees of freedom) and (NHBW+1) columns. The mapping from K to GSTIF is: n K_{i,j} --> {GSTIF(I,J-I+NHBW+1)} for j=i- {NHBW},...,i .
Then the diagonal of the matrix (entries where i=j ) will be stored in the final column of the array. You can see the entries of the element stiffness matrices being placed in GSTIF, in subroutine assmb of POISS.FOR.
The mapping is slightly more complicated in ELAST.FOR, where there are two degrees of freedom per node. Here, the u and v d.o.f.s of node n (i.e. displacements in the x and y directions at node n ) are global degrees of freedom numbers i_n+1 and i_n+2 , where i_n=2(n-1) . You can see this mapping being used at the start of subroutine assmb. After making this adjustment, the local d.o.f.s are mapped to GSTIF by the formula in the previous section. At the start of assmb, the halfbandwidth is found by finding the position of the first non-zero entry in row I: this is stored in the array NMINVC. Thus, if NMINVC(I)=MI, it means that k_{i,mi} \neq 0 but k_{i,j}=0 {for} j=1,...,mi-1 . Then the halfbandwidth NHBW is given by \max_{i=1, {NTOTV}}(i-mi) .
It will be seen that the size of the bandwidth, and hence the amount of storage required, will depend on the order in which the nodes of the mesh have been numbered. For this reason it is very important to use the element renumbering facility in PREFEL when producing the .dat file for input to the `main engine', to obtain a sensible numbering of the elements --- and then to re-number the nodes accordingly. Note that the bandwidth does not depend on the element numbering directly.
The mapping from K to GSTIF is again n K_{i,j} --> {GSTIF(I,J-I+NHBW+1)}. Now, however, there are NBW columns, and the diagonal entries of K lie down the (NHBW+1)'th column.
This storage technique is used in the soil consolidation program CONSL.FOR. As with band
storage, the amount of storage used depends on the numbering of the nodes in the mesh. We
again use the array NMINVC(I) to tell us the column where the first non-zero entry on
row I of K lies. Then (as K will by symmetric in CONSL) we store only the entries from
this point up to the diagonal entry, in GSTIF. GSTIF now becomes a one-dimensional array,
and we use a pointer array LPOINT(I) to tell us where the I'th row of K begins. That is,
the diagonal entry K_{i,i} will be stored as GSTIF(LPI), where LPI=LPOINT(I),
for I=1 to NTOTV. The dimension of GSTIF is then LPOINT(NTOTV), which we denote by NSTIF.
The parameter MSTIF is used to dimension GSTIF at the start of the program.
The arrays NMINVC and LPOINT are set up by a subroutine mask in CONSL.FOR.
LPOINT(I) can be found from LPOINT(I-1) by
LPOINT(I) = LPOINT(I-1) + I - NMINVC(I) + 1
since there are I-NMINVC(I)+1 entries to store on row I. We start with LPOINT(1)=1.
The mapping from the lower diagonal of K to GSTIF is now n K_{i,j} --> {GSTIF(LPOINT(I) - I + J)} if NMINVC(I) lt;= J lt;= I. This mapping is performed by a function gpos; if K_{i,j} lies outside the envelope of K which is being stored, gpos returns a value of zero. Otherwise, n K_{i,j} --> {GSTIF(GPOS(LPOINT,NTOTV,I,J))}.
This algorithm is represented in the following fragments of pseudo-code. first we show the reduction phase, in which the lower triangular matrix L is formed:
C...Reduction: do 20 i=1,n-1 pivot = G(i,i) do 10 k=i+1,n fact = G(k,i)/pivot do 5 j=i+1,n G(k,j) = G(k,j) - fact*G(i,j) 5 continue f(k) = f(k) - fact*f(i) 10 continue 20 continueHere, G(i,j) indicates the (i,j)'th entry of G; no modification has been made to represent the compact storage of G. If this pseudo-code is compared with the code in subroutine reduce of THERM.FOR, it will be seen how the mappings to the band storage of G in GSTIF (as described in the previous section for non-symmetric band storage) are implemented. Also, the start and end values in the do loops are modified, because we only need to perform these loops for entries within the band. These factors in translation from the pseudo-code given in the Manual, to the actual code in the `main engines', should be borne in mind for when studying the following algorithms.
The solution is completed by the back-substitution phase, performed in subroutine backsu of THERM.FOR:
c...back-substitution phase u(n) = f(n)/G(n,n) do 10 i=n-1,1, step -1 pivot = G(i,i) do 5 j=i+1,n f(i) = f(i) - G(i,j)*u(j) 5 continue u(i) = f(i)/pivot 10 continue
The entries in L must be found systematically. The first one which can be found is l_{1,1} , since it will be seen by forming the matrix LL^T and comparing it with K , that k_{1,1} = {l_{1,1}}^2 . Once l_{1,1} is known, we could calculate all the entries below this in the first column. However, we will work in a row-based manner, and find l_{2,1} , use this to find l_{2,2} , before moving on to the 3rd row to find l_{3,1},l_{3,2},l_{3,3} --- then move on to the 4th row, and so on. This row-based approach matches the row-based skyline storage algorithm described earlier.
This Choleski decomposition algorithm is used in the Basic-level programs POISS and ELAST, and can be found in subroutine chodec. The pseudo-code for this decomposition, applied to the matrix G, is as follows:
c T c...decompose G into L.L i = 1 aii = G(1,1) clii = sqrt(aii) G(1,1) = clii c...decompose each row in turn do 20 i=2,n c...work along the row do 10 j=1,i-1 cljj = G(j,j) aij = G(i,j) sum = 0.0 do 5 k=1,j-1 gik = G(i,k) gjk = G(j,k) sum = sum + gik*gjk 5 continue clij = (aij - sum)/cljj G(i,j) = clij 10 continue c...finally find the diagonal entry on the row aii = G(i,i) sum = 0.0 do 15 k=1,i-1 clik = G(i,k) sum = sum + clik*clik 15 continue clii2 = aii - sum clii = sqrt(clii2) G(i,i) = clii 20 continueNote that as each entry in L is found, it is written to G, so that the corresponding entry of the stiffness matrix gets overwritten. At the end of the algorithm, G contains the entries of L in its lower triangle.
In implementing this algorithm in a `main engine', we must use a mapping to convert G(i,j) to a storage space in GSTIF, as described in the earlier section on storage techniques.
The system Gu=f can now be rewritten LL^Tu=f , and is solved in two stages:
c...forward substitution: solve Ly = f y(1) = f(1)/G(1,1) do 10 i=2,n bi = f(i) sum = 0.0 do 5 k=1,i-1 gik = G(i,k) ak = y(k) sum = sum + gik*ak 5 continue clii = G(i,i) y(i) = (bi - sum)/clii 10 continue c T c...back-substitution phase: solve L x = y u(n) = y(n)/G(n,n) do 20 i=n-1,1, step -1 yi = y(i) sum = 0.0 do 15 k=i+1,n gki = G(k,i) ak = u(k) sum = sum + gki*ak 15 continue clii = G(i,i) u(i) = (yi - sum)/clii 20 continueIn program CONSL.FOR, we use the same solution algorithm, but modify its Fortran coding to take account of the skyline storage technique, described above. When row-based skyline storage is used with a Choleski decomposition, the algorithm given above for the final back-substitution phase (solve L^Tu=y ) is rather inefficient. This is because it works with rows of L^T , which are columns of L , the entries of which are scattered throughout the row-based storage array GSTIF. It can easily be modified, however, so that once an entry u_i is found, this is eliminated from all equations from i-1 up to row 1, which involves working up the column of L^T . The modified algorithm, replacing the last phase of the previous pseudo-code, is:
c...back-substitution phase: solve L x = y c...We can do this in a column-based manner! do 20 i=n,1, step -1 yi = y(i) clii = G(i,i) xi = yi/clii u(i) = xi c...once we have found x_i, transfer the term involving this, to c...the r.h.s. in all earlier equations if (i.eq.1) goto 20 do 15 j=i-1,1, step -1 clij = G(i,j) y(j) = y(j) - clij*xi 15 continue 20 continueThis form of the routine is also used in PLADV, where skyline storage is also employed.
The coding for this algorithm, for the case of a symmetric matrix, can be seen in subroutine front of the advanced elasticity program ELADV.FOR. I am indebted to Dr David Naylor for permission to use this subroutine. A number of global one-dimensional arrays have to be used, but the algorithm is still more space-efficient than skyline storage, and so ELADV should be used for elasticity analyses with complex meshes. There are comment lines in the subroutine, giving some guidance as to how the algorithm works.
The algorithm can be adapted to the case of a non-symmetric matrix G , in which case approximately twice as much storage will be needed. My own adaptation of front for the nonsymmetric case, is the subroutine afront in the elasto-viscoplasticity program VPLAS.FOR. This program handles non-associated plastic flow, which results in non-symmetric stiffness matrices.
The amount of storage needed depends on the numbering of the elements, not the nodes, as the algorithm is dependent on the order in which the elements are assembled into the front. The subroutine makes a preliminary run-through of the elements to check that sufficient storage is provided; this is defined by the parameter MFRON at the start of the program. The entries of G are allocated spaces in the one-dimensional array GSTIF, which is of dimension MFRON. As the assembly process is handled within front, there is no assmb subroutine within ELADV and VPLAS.
an
{Initialization}
v^{(0)} = 0
r^{(0)} = f
k = 0
z^{(0)} = C^{-1}r^{(0)}
p^{(0)} = z^{(0)}
{Begin iteration}
\alpha_k = \frac{(z^{(k)},r^{(k)})}{(Gp^{(k)},p^{(k)}}
v^{(k+1)} = v^{(k)} + \alpha_k p^{(k)}
r^{(k+1)} = r^{(k)} - \alpha_k Gp^{(k)}
z^{(k+1)} = C^{-1} r^{(k+1)}
ta_k = \frac{(z^{(k+1)},r^{(k+1)})}{(z^{(k)},r^{(k)}}
p^{(k+1)} = z^{(k+1)} + ta_k p^{(k)}
k = k + 1
{Next iteration}
a
This form of the PCG algorithm is taken from Bartelt(1989) -- see Chapter 10 for details pf recommended texts. Here, (u,v) denotes the inner product of vectors u and v . The matrix C is the preconditioning matrix. The algorithm is terminated when the norm of the residual vector r^{(k+1)} is less than TOLER times the norm of f ; then the solution u=v^{(k+1)} .
The simplest form of preconditioning is diagonal preconditioning, in which C = {diag}(G) . This algorithm is implemented in the Basic-level program FRAME.FOR. The solution is performed in subroutine pcgsol, the pseudo-code for which is:
c...solve Kv=g using diagk as preconditioner c...multiplications K.v are performed element-by-element, c...reading the element matrices of K from the scratch file c c...initial guess; do 2 i=1,n v(i) = 0.0 r(i) = g(i) z(i) = 0.0 2 continue c c...z = {Kdiag}^-1 * r do 15 i=1,n z(i) = r(i)/diagk(i) 15 continue c do 16 i=1,n p(i) = z(i) 16 continue ztr = 0.0 do 17 i=1,n zi = z(i) ri = r(i) ztr = ztr + zi*ri 17 continue call norm2(mtotv,ntotv,r,rnorm0) itscg = 0 write(6,*)'PGSOL: iteration',itscg,',norm = ',rnorm0 c 25 itscg = itscg + 1 c c...create y which is G*p call ematgv( p , y ) pty = 0.0 do 28 i=1,n pty = pty + p(i)*y(i) 28 continue alpha = ztr/pty do 30 i=1,n v(i) = v(i) + alpha*p(i) 30 continue c do 40 i=1,n r(i) = r(i) - alpha*y(i) 40 continue c call norm2(mtotv,ntotv,r,rnorm) c write(6,*)' iteration',itscg,',norm = ',rnorm c if(rnorm.lt.cgtol) goto 200 c c...z = {Kdiag}^-1 * r do 45 i=1,n z(i) = r(i)/diagk(i) 45 continue c ztr0 = ztr ztr = 0.0 do 60 i=1,ntotv ztr = ztr + z(i)*r(i) 60 continue beta = ztr/ztr0 c do 130 i=1,ntotv p(i) = z(i) + beta*p(i) 130 continue c goto 25 c 200 continue write(6,*)'Solution converged after ',itscg,' iterations.'With the diagonal preconditioner (stored in the array diagk), the formation of C^{-1}v is achieved by simply dividing the elements of v by the corresponding elements of diagk. The array diagk is assembled from the element stiffness matrices in the subroutine assdk. The global matrix-vector multiplication Gv is performed element-by-element, reading the element stiffness matrices from the scratch file, in subroutine ematgv; thus, there is no assembly or storage of the global matrix G .
Diagonal preconditioning works surprisingly well, considering its simplicity. If a more sophisticated preconditioner is required, the most popular currently is to perform an Incomplete Choleski (IC) factorization of G . This is essentially a symmetric Choleski decomposition of G into LL^T , as described earlier, but avoiding some or all of the `fill-in' which occurs within the envelope of G . That is, we may set L(i,j)=0 instead of L(i,j)=l_{i,j} if G(i,j)=0 . To compensate this, the calculated l_{i,j} is instead added to the diagonal term L(i,i) .
Both diagonal and IC preconditioning are implemented in the advanced plasticity program PLADV.FOR. The form of IC preconditioning there is adaptive: the fill-in is avoided if the magnitude of l_{i,j} is less than a user-specified multiple ( filtol) of the diagonal term. PLADV.FOR uses the skyline storage technique for G . The subroutine inchol in PLADV.FOR performs this IC decomposition. The pseudo-code below should be compared with that for subroutine chodec:
c...Incomplete Choleski decomposition small = 1.e-8 c...count the amount of fill-in which occurs kfill = 0 khole = 0 i = 1 a11 = G(1,1) cl11 = sqrt(a11) G(1,1) = cl11 c...decompose each row in turn do 20 i=2,n aii = G(i,i) c...we will not fill in if the term is smaller than filtol filtol = factor*abs(aii) c...work along the row do 10 j=1,i-1 cljj = G(j,j) aij = G(i,j) sum = 0.0 do 5 k=1,j-1 sum = sum + G(i,k)*G(j,k) 5 continue clij = (aij - sum)/cljj c...here is the 'Incomplete' part gij = G(i,j) if (gij.eq.0.0) then khole = khole + 1 if (abs(clij).lt.filtol) then c...avoid fill-in: add i,j term to diagonal instead G(i,i) = G(i,i) + clij goto 10 endif c...fill-in will occur kfill = kfill + 1 endif G(i,j) = clij 10 continue c...finally find the diagonal entry on the row aii = G(i,i) sum = 0.0 do 15 k=1,i-1 clik = gstif(i,k) if (clik.eq.0.0) goto 15 sum = sum + clik*clik 15 continue clii2 = aii - sum clii = sqrt(clii2) G(i,i) = clii 20 continue write(6,*) 'No. of holes found in G matrix = ',khole write(6,*) 'No. of holes filled-in = ',kfillAs might be expected, there is a trade-off between the amount of fill-in which is prevented, and the effectiveness of the resulting preconditioner. The example datafile pltun4.dat, used with the solution options in PLADV, is discussed in Chapter 9.
Note that this is a very simple form of IC algorithm. There have been many improved algorithms proposed in recent years, and users wishing to apply this technique in practice are advised to search specialist journals for the lastest developments.
The reader will have noticed that additional tolerance parameters must be specified when an iterative solver is used. In PLADV, the iterative solvers are chosen by setting the parameter ISOLA to 2 or 3:
The alternative method for imposing nodal fixities, used in ELAST.FOR, is sometimes called a `big spring' approach. To impose the condition that u_k \approx 0.0 for some k , we add a large number ( 10^{20} ) to the corresponding diagonal term in GSTIF; this is done in subroutine fixdof, called by solve before calling chodec. This will force the value of u_k arising from the solution process to be very small, and we later set such small values to zero.
Fixed values of u_k which are non-zero (e.g. specified displacements, or the specified values of the potential for nodes lying on Dirichlet boundaries) are dealt with similarly. If we want to ensure that u_k = g , we multiply K_{k,k} by 10^{12} , and set the corresponding entry in the r.h.s. vector as F_k = g*K_{k,k} . This process can be seen in fixdof in POISS.FOR.
When a displacement degree of freedom is fixed, there will be a reaction force in this direction at the node, to maintain equilibrium. This can be found after the main solution, by calculating the r.h.s. for the corresponding row. This is done in subroutine reactn in ELAST.FOR and FRAME.FOR; these reactions are written into the load vector ASLOD, which is written in the result file.