[Openmp-dev] fortran OpenMP hangs in the last subroutine

Sat May 14 08:16:15 PDT 2016

 Hello from Toulouse, France, so please excuse my sometimes poor or
hesitant English; I want to paralelize a fortran tool that
post-processes finite elements stress results. The serial version is Ok,
but a run lasts five days, and the machine should be able to reduce this
to half a day with OpenMP.
 The problem is that the program enters an endless loop while ran on
several processors, even with as few as a twentieth of the full data set
it should finally process. And it behaves perfectly well with an even
smaller set of data.

[rather long story, but interesting ang challenging]
 The program reads its input data, integer and real numbers, from six
separate files.
The first one simply contains a list of entities to process repeatedly.
(up to eleven thousand)
 Using three other of the input files, each entity is converted by a big
subroutine into a series of half an hundredth of integer-real pairs.
 Using these pairs, the two other input files enable the most resource
demanding subprogram to create 251 column matrices of up to 27752 rows
each; this is the one which I basically want to make parallelized. (and
everything is Ok up to here)
 Each column matrix is finally transformed in a single number, which is
a function of the entity selected. (and of the other input files, which
contents are fixed)

 It finishes by making a big deal of similar operations, which I would
like to share between the processors of the machine given for that work.
(a rather strong linux RHEL machine with sixteen processors)
 Everything is done by several subprograms, the last of which loops
endlessly when ran on several processors.

 This last offending subroutine is just a numerical process, which input
is one of the already several times processed column matrix, and which
output is a single real number, showing one characteristic of the entity
processed.

 It is called like that :
"call rnflow(Arr, Dimc, FAT, ntf, p, q)"
 Almost all of its arguments are previously scoped PRIVATE or SHARED by
these two following clauses of the main PARALLEL DO directive. (in
fixed-format instructions)

c$OMP+SHARED(p, q)
c$OMP+PRIVATE(Arr, Dimc, ntf)
 Then a last argument, the output value, has to be scoped private on a
last clause.
c$OMP+PRIVATE( FAT )

 Now here is once again a careful scoping of all the variables used, and
I need some help to get out of this mess, because it behaves as if there
was a race condition somewhere but I can't find it.
 Can anyone give me a simple idea, a simple word to try to get out ?
 Here are the tricks I thought of, and that I will try soon :

Debug with –O0 –openmp

Le Z doit être privé dès avant fabstf

!$OMP BARRIER ???
intent(in)
intent(out)
chunks
it is advisable to make 'recursive' the subroutines used within a
parallel construct :
 recursive function ack(m, n) result(a)
PURE
SCHEDULE ??? STATIC,1/ GUIDED
FLUSH

Routines with large argument lists will contain 5 variables per line

prologue formats for functions and subroutines, modules, and header
files are shown. In addition to the keywords in these templates, ProTeX
also recognizes the following: 
	!BUGS:
	!SEE ALSO:
	!SYSTEM ROUTINES:
	!FILES USED:
	!REMARKS:
	!TO DO:
	!CALLING SEQUENCE:
	!CALLED FROM:
	!LOCAL VARIABLES:

Intent All dummy arguments must include the INTENT clause in their
declaration. This is extremely valuable to someone reading the code, and
can be checked by compilers. An example is: 
        subroutine sub1 (x, y, z) 
        implicit none 
        real(r8), intent(in) :: x 

 The complete PARALLEL DO directive is as follows, with comments showing
the subroutines called, and the scoping of the variables, all previously
dispached between SHARED or PRIVATE :

c$OMP PARALLEL DO
c$OMP+DEFAULT(NONE)
c$OMP+NUM_THREADS(NTLU) ! read as input argument
c$OMP+SHARED(q) ! used in the last offending subroutine
c$OMP+PRIVATE(DIM, DIR, IDelt, line, som, status, stot, stotfinal, TID)
c----------------------------------------------------------------------|
ccall lecelt(AngTmp,   cod,      EIDelt,   FICINP,   IDeltN,   maxID,
p)
c$OMP+SHARED(AngTmp,   cod,      EIDelt,   FICINP,   IDeltN,   maxID)
c$OMP+SHARED(                                                       p)
c
c                      cod,      EIDelt                                !
SHARED from lecelt above
ccall fabett(cmd,      cod,      EIDelt,   FICRES,   FICRES0)
c$OMP+SHARED(cmd,                          FICRES,   FICRES0)
c
ccall lecana(lblocana, lina,     LTF,      NLana,    nocc)
c$OMP+SHARED(lblocana, lina,     LTF,      NLana,    nocc)
c
ccall leccvt(Ccvt1,    Ccvt2,    FScvt2)
c$OMP+SHARED(Ccvt1,    Ccvt2,    FScvt2)
c
c            cod,      EIDelt                                          !
SHARED from lecelt above
ccall lecpch(cod,      EIDelt,   IDpchN,   LTIT,     Spch)
c$OMP+SHARED(                    IDpchN,   LTIT,     Spch)
c
c            EIDelt                                                    !
SHARED from lecelt above
ccall lecT1G(EIDelt,   FS,       NC,       T1G)                        !
pour les NC cas 1G
c$OMP+SHARED(          FS,       NC,       T1G)
c
c                      EIDelt                                          !
SHARED from lecelt above
ccall lecTPO(ASI,      EIDelt,   FS2,      NPO,      TPO)              !
pour les NPO pertus ordinaires
c$OMP+SHARED(ASI,                FS2,      NPO,      TPO)
c
c----------------------------------------------------------------------|
c Beginning of the DO PARALLEL loop
c----------------------------------------------------------------------|
c
c call fabstf(ASI,      AngTmp,   cod,      EIDelt,   FS,              !
c+            FS2,      IDelt,    IDpchN,   NC,       NPO,             !
c+            Spch,     Sstf,     T1G,      TPO,      Z)               !
c
c                       AngTmp,   cod,      EIDelt    -                !
SHARED from lecelt above
c
c                                                     FS               !
SHARED from lecT1G above
c                                           NC        -                !
SHARED from lecT1G above
c                                  T1G      -         -                !
SHARED from lecT1G above
c
c             ASI       -          -        -         -                !
SHARED from lecTPO above
c             FS2,      -          -        -         NPO              !
SHARED from lecTPO above
c                                           TPO       -                !
SHARED from lecTPO above
c
c                       IDelt     -         -         -                !
PRIVATE as DO-variable
c
c                                 IDpchN    -         -                !
SHARED from lecpch above
c              Spch     -         -         -         -                !
SHARED from lecpch above
c
c$OMP+PRIVATE(                                        Z)               !
PRIVATE from the main program
c
c$OMP+PRIVATE(          Sstf)                                          !
PRIVATE from here
c----------------------------------------------------------------------|
c call fabcol(ASI,      AngTmp,   Ccvt1,    Ccvt2,    cod,             !
c+            col,      Dimc,     EIDelt,   FS,       FS2,             !
c+            FScvt2,   IDelt,    lblocana, lina,     LTF,             !
c+            NC,       NPO,      ntf,      Spch,     Sstf,            !
c+            T1G,      TPO,      Z)
c
c                       AngTmp,             -         cod       -      !
SHARED from lecelt above
c                                 EIDelt    -         -         -      !
SHARED from lecelt above
c
c                                 Ccvt1,    Ccvt2     -         -      !
SHARED from leccvt above
c
c             ASI,      -         -         -         FS2       -      !
SHARED from lecTPO above
c                       NPO       -         -         -         -      !
SHARED from lecTPO above
c
c$OMP+PRIVATE(                    ntf)                                 !
PRIVATE from the main program
c
c$OMP+PRIVATE(col,      Dimc)                                          !
PRIVATE from here
c----------------------------------------------------------------------|
c call dblons(col,      Dimc,     ntf,      valinter)
c                                 ntf       -         -         -      !
PRIVATE from the main program
c             col,      Dimc      -         -         -         -      !
PRIVATE from fabcol above
c$OMP+PRIVATE(                              valinter)                  !
PRIVATE from here
c----------------------------------------------------------------------|
c call ValInt(Arr,      Dimc,     ntf,      valinter)
c                                 ntf       -         -         -      !
PRIVATE from the main program
c                       Dimc      -         -         -         -      !
PRIVATE from fabcol above
c                                           valinter  -         -      !
PRIVATE from dblons above
c$OMP+PRIVATE(Arr)                                                     !
PRIVATE from here
c----------------------------------------------------------------------|
c call rnflow(Arr,      Dimc,     FAT,      ntf,      p,        q      !
PRIVATE from the main program
c                                                               q      !
SHARED from the main program
c                                                     p         -      !
SHARED from lecelt above 
c                       Dimc      -         -                          !
PRIVATE from fabcol above
c             Arr       -         -         -         -         -      !
PRIVATE from ValInt above
c                                           ntf                        !
PRIVATE from the main program
c$OMP+PRIVATE(FAT)                                                     !
PRIVATE from here
c----------------------------------------------------------------------|
c$OMP+SCHEDULE(DYNAMIC)

 What can I do to try and understand anything ?
Thank you,
David

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20160514/6e18dc12/attachment-0001.html>