<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 TRANSITIONAL//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8">
<META NAME="GENERATOR" CONTENT="GtkHTML/4.6.6">
</HEAD>
<BODY>
<TT> Hello from Toulouse, France, so please excuse my sometimes poor or hesitant English; I want to paralelize a fortran tool that post-processes finite elements stress results. The serial version is Ok, but a run lasts five days, and the machine should be able to reduce this to half a day with OpenMP.</TT><BR>
<TT> The problem is that the program enters an endless loop while ran on several processors, even with as few as a twentieth of the full data set it should finally process. And it behaves perfectly well with an even smaller set of data.</TT><BR>
<BR>
<TT>[rather long story, but interesting ang challenging]</TT><BR>
<TT> The program reads its input data, integer and real numbers, from six separate files.</TT><BR>
<TT>The first one simply contains a list of entities to process repeatedly. (up to eleven thousand)</TT><BR>
<TT> Using three other of the input files, each entity is converted by a big subroutine into a series of half an hundredth of integer-real pairs.</TT><BR>
<TT> Using these pairs, the two other input files enable the most resource demanding subprogram to create 251 column matrices of up to 27752 rows each; this is the one which I basically want to make parallelized. (and everything is Ok up to here)</TT><BR>
<TT> Each column matrix is finally transformed in a single number, which is a function of the entity selected. (and of the other input files, which contents are fixed)</TT><BR>
<BR>
<TT> It finishes by making a big deal of similar operations, which I would like to share between the processors of the machine given for that work. (a rather strong linux RHEL machine with sixteen processors)</TT><BR>
<TT> Everything is done by several subprograms, the last of which loops endlessly when ran on several processors.</TT><BR>
<BR>
<TT> This last offending subroutine is just a numerical process, which input is one of the already several times processed column matrix, and which output is a single real number, showing one characteristic of the entity processed.</TT><BR>
<BR>
<TT> It is called like that :</TT><BR>
<TT>"call rnflow(Arr, Dimc, FAT, ntf, p, q)"</TT><BR>
<TT> Almost all of its arguments are previously scoped PRIVATE or SHARED by these two following clauses of the main PARALLEL DO directive. (in fixed-format instructions)</TT><BR>
<BR>
<TT>c$OMP+SHARED(p, q)</TT><BR>
<TT>c$OMP+PRIVATE(Arr, Dimc, ntf)</TT><BR>
<TT> Then a last argument, the output value, has to be scoped private on a last clause.</TT><BR>
<TT>c$OMP+PRIVATE( FAT )</TT><BR>
<BR>
<BR>
<BR>
<TT> Now here is once again a careful scoping of all the variables used, and I need some help to get out of this mess, because it behaves as if there was a race condition somewhere but I can't find it.</TT><BR>
<TT> Can anyone give me a simple idea, a simple word to try to get out ?</TT><BR>
<TT> Here are the tricks I thought of, and that I will try soon :</TT><BR>
<BR>
<TT>Debug with –O0 –openmp</TT><BR>
<BR>
<TT>Le Z doit être privé dès avant fabstf</TT><BR>
<BR>
<TT>!$OMP BARRIER ???</TT><BR>
<TT>intent(in)</TT><BR>
<TT>intent(out)</TT><BR>
<TT>chunks</TT><BR>
<TT>it is advisable to make 'recursive' the subroutines used within a parallel construct :</TT><BR>
<TT> recursive function ack(m, n) result(a)</TT><BR>
<TT>PURE</TT><BR>
<TT>SCHEDULE ??? STATIC,1/ GUIDED</TT><BR>
<TT>FLUSH</TT><BR>
<BR>
<TT>Routines with large argument lists will contain 5 variables per line</TT><BR>
<BR>
<TT>prologue formats for functions and subroutines, modules, and header files are shown. In addition to the keywords in these templates, ProTeX also recognizes the following: </TT><BR>
<TT> !BUGS:</TT><BR>
<TT> !SEE ALSO:</TT><BR>
<TT> !SYSTEM ROUTINES:</TT><BR>
<TT> !FILES USED:</TT><BR>
<TT> !REMARKS:</TT><BR>
<TT> !TO DO:</TT><BR>
<TT> !CALLING SEQUENCE:</TT><BR>
<TT> !CALLED FROM:</TT><BR>
<TT> !LOCAL VARIABLES:</TT><BR>
<BR>
<TT>Intent All dummy arguments must include the INTENT clause in their declaration. This is extremely valuable to someone reading the code, and can be checked by compilers. An example is: </TT><BR>
<TT> subroutine sub1 (x, y, z) </TT><BR>
<TT> implicit none </TT><BR>
<TT> real(r8), intent(in) :: x </TT><BR>
<BR>
<BR>
<BR>
<BR>
<BR>
<BR>
<TT> The complete PARALLEL DO directive is as follows, with comments showing the subroutines called, and the scoping of the variables, all previously dispached between SHARED or PRIVATE :</TT><BR>
<BR>
<TT>c$OMP PARALLEL DO</TT><BR>
<TT>c$OMP+DEFAULT(NONE)</TT><BR>
<TT>c$OMP+NUM_THREADS(NTLU) ! read as input argument</TT><BR>
<TT>c$OMP+SHARED(q) ! used in the last offending subroutine</TT><BR>
<TT>c$OMP+PRIVATE(DIM, DIR, IDelt, line, som, status, stot, stotfinal, TID)</TT><BR>
<TT>c----------------------------------------------------------------------|</TT><BR>
<TT>ccall lecelt(AngTmp, cod, EIDelt, FICINP, IDeltN, maxID, p)</TT><BR>
<TT>c$OMP+SHARED(AngTmp, cod, EIDelt, FICINP, IDeltN, maxID)</TT><BR>
<TT>c$OMP+SHARED( p)</TT><BR>
<TT>c</TT><BR>
<TT>c cod, EIDelt ! SHARED from lecelt above</TT><BR>
<TT>ccall fabett(cmd, cod, EIDelt, FICRES, FICRES0)</TT><BR>
<TT>c$OMP+SHARED(cmd, FICRES, FICRES0)</TT><BR>
<TT>c</TT><BR>
<TT>ccall lecana(lblocana, lina, LTF, NLana, nocc)</TT><BR>
<TT>c$OMP+SHARED(lblocana, lina, LTF, NLana, nocc)</TT><BR>
<TT>c</TT><BR>
<TT>ccall leccvt(Ccvt1, Ccvt2, FScvt2)</TT><BR>
<TT>c$OMP+SHARED(Ccvt1, Ccvt2, FScvt2)</TT><BR>
<TT>c</TT><BR>
<TT>c cod, EIDelt ! SHARED from lecelt above</TT><BR>
<TT>ccall lecpch(cod, EIDelt, IDpchN, LTIT, Spch)</TT><BR>
<TT>c$OMP+SHARED( IDpchN, LTIT, Spch)</TT><BR>
<TT>c</TT><BR>
<TT>c EIDelt ! SHARED from lecelt above</TT><BR>
<TT>ccall lecT1G(EIDelt, FS, NC, T1G) ! pour les NC cas 1G</TT><BR>
<TT>c$OMP+SHARED( FS, NC, T1G)</TT><BR>
<TT>c</TT><BR>
<TT>c EIDelt ! SHARED from lecelt above</TT><BR>
<TT>ccall lecTPO(ASI, EIDelt, FS2, NPO, TPO) ! pour les NPO pertus ordinaires</TT><BR>
<TT>c$OMP+SHARED(ASI, FS2, NPO, TPO)</TT><BR>
<TT>c</TT><BR>
<TT>c----------------------------------------------------------------------|</TT><BR>
<TT>c Beginning of the DO PARALLEL loop</TT><BR>
<TT>c----------------------------------------------------------------------|</TT><BR>
<TT>c</TT><BR>
<TT>c call fabstf(ASI, AngTmp, cod, EIDelt, FS, !</TT><BR>
<TT>c+ FS2, IDelt, IDpchN, NC, NPO, !</TT><BR>
<TT>c+ Spch, Sstf, T1G, TPO, Z) !</TT><BR>
<TT>c</TT><BR>
<TT>c AngTmp, cod, EIDelt - ! SHARED from lecelt above</TT><BR>
<TT>c</TT><BR>
<TT>c FS ! SHARED from lecT1G above</TT><BR>
<TT>c NC - ! SHARED from lecT1G above</TT><BR>
<TT>c T1G - - ! SHARED from lecT1G above</TT><BR>
<TT>c</TT><BR>
<TT>c ASI - - - - ! SHARED from lecTPO above</TT><BR>
<TT>c FS2, - - - NPO ! SHARED from lecTPO above</TT><BR>
<TT>c TPO - ! SHARED from lecTPO above</TT><BR>
<TT>c</TT><BR>
<TT>c IDelt - - - ! PRIVATE as DO-variable</TT><BR>
<TT>c</TT><BR>
<TT>c IDpchN - - ! SHARED from lecpch above</TT><BR>
<TT>c Spch - - - - ! SHARED from lecpch above</TT><BR>
<TT>c</TT><BR>
<TT>c$OMP+PRIVATE( Z) ! PRIVATE from the main program</TT><BR>
<TT>c</TT><BR>
<TT>c$OMP+PRIVATE( Sstf) ! PRIVATE from here</TT><BR>
<TT>c----------------------------------------------------------------------|</TT><BR>
<TT>c call fabcol(ASI, AngTmp, Ccvt1, Ccvt2, cod, !</TT><BR>
<TT>c+ col, Dimc, EIDelt, FS, FS2, !</TT><BR>
<TT>c+ FScvt2, IDelt, lblocana, lina, LTF, !</TT><BR>
<TT>c+ NC, NPO, ntf, Spch, Sstf, !</TT><BR>
<TT>c+ T1G, TPO, Z)</TT><BR>
<TT>c</TT><BR>
<TT>c AngTmp, - cod - ! SHARED from lecelt above</TT><BR>
<TT>c EIDelt - - - ! SHARED from lecelt above</TT><BR>
<TT>c</TT><BR>
<TT>c Ccvt1, Ccvt2 - - ! SHARED from leccvt above</TT><BR>
<TT>c</TT><BR>
<TT>c ASI, - - - FS2 - ! SHARED from lecTPO above</TT><BR>
<TT>c NPO - - - - ! SHARED from lecTPO above</TT><BR>
<TT>c</TT><BR>
<TT>c$OMP+PRIVATE( ntf) ! PRIVATE from the main program</TT><BR>
<TT>c</TT><BR>
<TT>c$OMP+PRIVATE(col, Dimc) ! PRIVATE from here</TT><BR>
<TT>c----------------------------------------------------------------------|</TT><BR>
<TT>c call dblons(col, Dimc, ntf, valinter)</TT><BR>
<TT>c ntf - - - ! PRIVATE from the main program</TT><BR>
<TT>c col, Dimc - - - - ! PRIVATE from fabcol above</TT><BR>
<TT>c$OMP+PRIVATE( valinter) ! PRIVATE from here</TT><BR>
<TT>c----------------------------------------------------------------------|</TT><BR>
<TT>c call ValInt(Arr, Dimc, ntf, valinter)</TT><BR>
<TT>c ntf - - - ! PRIVATE from the main program</TT><BR>
<TT>c Dimc - - - - ! PRIVATE from fabcol above</TT><BR>
<TT>c valinter - - ! PRIVATE from dblons above</TT><BR>
<TT>c$OMP+PRIVATE(Arr) ! PRIVATE from here</TT><BR>
<TT>c----------------------------------------------------------------------|</TT><BR>
<TT>c call rnflow(Arr, Dimc, FAT, ntf, p, q ! PRIVATE from the main program</TT><BR>
<TT>c q ! SHARED from the main program</TT><BR>
<TT>c p - ! SHARED from lecelt above </TT><BR>
<TT>c Dimc - - ! PRIVATE from fabcol above</TT><BR>
<TT>c Arr - - - - - ! PRIVATE from ValInt above</TT><BR>
<TT>c ntf ! PRIVATE from the main program</TT><BR>
<TT>c$OMP+PRIVATE(FAT) ! PRIVATE from here</TT><BR>
<TT>c----------------------------------------------------------------------|</TT><BR>
<TT>c$OMP+SCHEDULE(DYNAMIC)</TT><BR>
<BR>
<BR>
<TT> What can I do to try and understand anything ?</TT><BR>
<TT>Thank you,</TT><BR>
<TT>David</TT><BR>
<BR>
</BODY>
</HTML>