<font size=2 face="sans-serif">Hi Johannes,</font><br><br><font size=2 face="sans-serif">Your clarifications helped a lot, having

all details gathered in one place helped me understand better what you

are proposing.</font><br><br><font size=2 face="sans-serif">Thanks a lot for taking the time to

explain.</font><br><br><font size=2 face="sans-serif">Thanks,</font><br><br><font size=2 face="sans-serif">--Doru<br></font><br><br><br><br><font size=1 color=#5f5f5f face="sans-serif">From:      

 </font><font size=1 face="sans-serif">"Doerfert, Johannes"

<jdoerfert@anl.gov></font><br><font size=1 color=#5f5f5f face="sans-serif">To:      

 </font><font size=1 face="sans-serif">Gheorghe-Teod Bercea

<Gheorghe-Teod.Bercea@ibm.com></font><br><font size=1 color=#5f5f5f face="sans-serif">Cc:      

 </font><font size=1 face="sans-serif">Alexey Bataev <a.bataev@outlook.com>,

"cfe-dev@lists.llvm.org" <cfe-dev@lists.llvm.org>, Guray

Ozen <gozen@nvidia.com>, "Gregory.Rodgers@amd.com" <Gregory.Rodgers@amd.com>,

"Finkel, Hal J." <hfinkel@anl.gov>, "kli@ca.ibm.com"

<kli@ca.ibm.com>, LLVM-Dev <llvm-dev@lists.llvm.org>, "openmp-dev@lists.llvm.org"

<openmp-dev@lists.llvm.org></font><br><font size=1 color=#5f5f5f face="sans-serif">Date:      

 </font><font size=1 face="sans-serif">01/31/2019 12:34 PM</font><br><font size=1 color=#5f5f5f face="sans-serif">Subject:    

   </font><font size=1 face="sans-serif">Re: [RFC] Late

(OpenMP) GPU code "SPMD-zation"</font><br><hr noshade><br><br><br><tt><font size=2>Hi Doru,<br><br>maybe I should clarify something I mentioned in an earlier email already<br>but it seems there are things getting lost in this thread:<br><br>  While the prototype replaces code generation parts in Clang, the<br>  actual patches will add alternative code generation paths, guarded<br>  under a cmd flag. Once, and obviously only if, everything is in

place<br>  and has been shown to improve the current situation, the default

path<br>  would be switched.<br><br><br>On 01/31, Gheorghe-Teod Bercea wrote:<br>> Hi Johannes,<br>> <br>> Thank you for the explanation.<br>> <br>> I think we need to clarify some details about code generation in Clang

today:<br><br>I'm not really sure why you feel the need to do that but OK.<br><br><br>> 1. non-SPMD mode, or generic mode, uses the master-worker code gen

scheme where<br>> the master thread and the worker threads are disjoint sets of threads

(when one<br>> set runs the other set is blocked and doesn't participate in the execution):<br>> <br>> workers  |  master<br>> ====================<br>> BLOCKED  | RUNNING<br>> ------- sync -------<br>> RUNNING  | BLOCKED<br>> ------- sync -------<br>> BLOCKED  | RUNNING<br><br>I agree, and for the record, this is not changed by my prototype, see<br>[1, line 295].<br><br>[1] </font></tt><a href="https://reviews.llvm.org/D57460#change-e9Ljd9RgdWYz"><tt><font size=2>https://reviews.llvm.org/D57460#change-e9Ljd9RgdWYz</font></tt></a><tt><font size=2><br><br><br>> 2. the worker threads, in their RUNNING state above, contain a state

machine<br>> which chooses the parallel region to be executed. Today this choice

happens in<br>> one of two ways: explicit targets (where you know what outlined region

you are<br>> calling and you just call it) and indirect targets (via function pointer

set by<br>> master thread in one of its RUNNING regions):<br>> <br>> workers  |  master<br>> ====================<br>> BLOCKED  | RUNNING<br>> ------- sync -------<br>> RUNNING  |<br>>  state   | BLOCKED<br>> machine  |<br>> ------- sync -------<br>> BLOCKED  | RUNNING<br><br>Partially agreed. Afaik, it will always be decided through a function<br>pointer set by the master thread and communicated to the workers through<br>the runtime. The workers use a switch, or in fact an if-cascade, to<br>check if the function pointer points to a known parallel region. If so<br>it will be called directly, otherwise there is the fallback indirect<br>call of the function pointer.<br><br>> Your intended changes (only target the RUNNING state machine of the

WORKERS):<br>> - remove explicit targets from current code gen. (by itself this is

a major<br>> step back!!)<br>> - introduce a pass in LLVM which will add back the explicit targets.<br><br>Simplified but correct. From my perspective this is not a problem<br>because in production I will always run the LLVM passes after Clang.<br>Even if you do not run the LLVM passes, the below reasoning might be<br>enough to convince people to run a similar pass in their respective<br>pipeline. If that is not enough, we can also keep the Clang state<br>machine generation around (see the top comment).<br><br><br>> Can you point out any major improvements this will bring compared

to the<br>> current state?<br><br>Sure, I'll give you three for now:<br><br>[FIRST]<br>Here is the original motivation from the first RFC mail (in case you<br>have missed it):<br><br> 2) Implement a middle-end LLVM-IR pass that detects the guarded mode,<br>    e.g., through the runtime library calls used, and that tries

to<br>    convert it into the SPMD mode potentially by introducing

lightweight<br>    guards in the process.<br><br>    Why:<br>    - After the inliner, the canonicalizations, dead code elimination,<br>      code movement [2, Section 7 on page 8], we have a

clearer picture<br>      of the code that is actually executed in the target

region and all<br>      the side effects it contains. Thus, we can make an

educated<br>      decision on the required amount of guards that prevent

unwanted<br>      side effects from happening after a move to SPMD mode.<br>    - At this point we can more easily introduce different schemes

to<br>      avoid side effects by threads that were not supposed

to run. We<br>      can decide if a state machine is needed, conditionals

should be<br>      employed, masked instructions are appropriate, or

"dummy" local<br>      storage can be used to hide the side effect from the

outside       world. [2] </tt><a href="http://compilers.cs.uni-saarland.de/people/doerfert/par_opt18.pdf"><tt><font size=2>http://compilers.cs.uni-saarland.de/people/doerfert/par_opt18.pdf</tt></a><tt><font size=2> Let me give you the canonical example that shows the need for this:   #pragma omp target teams   {     foo(i + 0)     foo(i + 1)     foo(i + 2)   }   void foo(int i) {   #pragma omp parallel   ...   } The target region can be executed in SPMD mode but we cannot decide that syntactically when the region is encountered. Agreed? [SECOND] Now there are other benefits with regards to the above mentioned state machine. In the LLVM pass we can analyze the kernel code interprocedurally and detect all potentially executed parallel regions, together with a relation between them, and the need for the fallback case. That means we can build a state machine that __takes control dependences into account__, __after inlining and dead code elimination__ canonicalized the kernel. If inlining and code canonicalization resulted in the following structure, the state machine we can build late can know that after section0 the workers will execute section1, potentially multiple times, before they move on to section3. In today's scheme, this is sth. we cannot simply do, causing us to traverse the if-cascade from top to bottom all the time (which grows linear with the number of parallel regions).   if (...) {     #pragma omp parallel     section0(...)     do {       #pragma omp parallel       section1(...)     } while (...)   }   #pragma omp parallel   section3(...) [THIRD] Depending on the hardware, we need to make sure, or at least try rally hard, that there is no fallback case in the state machine, which is an indirect function call. This can be done best at link time which requires us to analyze the kernel late and modify the state machine at that point anyway. > From your answer below you mention a lower number of function calls.

Since<br>> today we inline everything anyway how does that help?<br><br>If we inline, it doesn't for performance purposes. If we do not inline,<br>it does. In either case, it helps to simplify middle-end analyses and<br>transformations that work on kernels. Finally, it prevents us from<br>wasting compile time looking at the (unoptimizable) state machine of<br>every target region.<br><br>Maybe it is worth asking the opposite question:<br>  What are the reasons against these general runtime calls that hide

the<br>  complexity we currently emit into the user code module?<br>[Note that I discuss the only drawback I came up with, a non-customized<br>state machine, already above.]<br><br><br>> If you haven't considered performance so far how come you're proposing

all<br>> these changes? What led you to propose all these changes?<br><br>See above.<br><br><br>> In SPMD mode all threads execute the same code. Using the notation

in the<br>> schemes above you can depict this as:<br>> <br>>     all threads<br>> ====================<br>>       RUNNING<br>> <br>> No state machine being used, no disjoints sets of threads. This is

as<br>> if you're executing CUDA code.<br><br>Agreed.<br><br><br>> Could you explain what your proposed changes are in this context?<br><br>None, at least after inlining the runtime library calls there is<br>literally the same code executed before and after the changes.<br><br><br>> Could you also explain what you mean by "assuming SPMD wasn't

achieved"?<br><br>That is one of the two motivations for the whole change. I explained<br>that in the initial RFC and again above. The next comment points you to<br>the code that tries to achieve SPMD mode for inputs that were generated<br>in the non-SPMD mode (master-worker + state machine) by Clang.<br><br><br>> Do you expect to write another LLVM pass which will transform the<br>> master-worker scheme + state machine into an SPMD scheme?<br><br>I did already, as that was the main motivation for the whole thing.<br>It is part of the prototype, see [3, line 321].<br><br>[3] </font></tt><a href="https://reviews.llvm.org/D57460#change-8gnnGNfJVR4B"><tt><font size=2>https://reviews.llvm.org/D57460#change-8gnnGNfJVR4B</font></tt></a><tt><font size=2><br><br><br>Cheers,<br>  Johannes<br><br><br>> From:        "Doerfert, Johannes" <jdoerfert@anl.gov><br>> To:        Gheorghe-Teod Bercea <Gheorghe-Teod.Bercea@ibm.com><br>> Cc:        Alexey Bataev <a.bataev@outlook.com>,

Guray Ozen <gozen@nvidia.com>, > "Gregory.Rodgers@amd.com" <Gregory.Rodgers@amd.com>,

"Finkel, Hal J."<br>> <hfinkel@anl.gov>, "kli@ca.ibm.com" <kli@ca.ibm.com>,<br>> "openmp-dev@lists.llvm.org" <openmp-dev@lists.llvm.org>,

LLVM-Dev<br>> <llvm-dev@lists.llvm.org>, "cfe-dev@lists.llvm.org"

<cfe-dev@lists.llvm.org><br>> Date:        01/30/2019 07:56 PM<br>> Subject:        Re: [RFC] Late (OpenMP) GPU code

"SPMD-zation"<br>> $B(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(B<br>> <br>> <br>> <br>> Hi Doru,<br>> <br>> [+ llvm-dev and cfe-dev]<br>> <br>> On 01/30, Gheorghe-Teod Bercea wrote:<br>> > Hi Johannes,<br>> ><br>> > First of all thanks for looking into the matter of improving

non-SPMD mode!<br>> ><br>> > I have a question regarding the state machine that you said you'd

like to<br>> > replace/improve. There are cases (such as target regions that

span multiple<br>> > compilation units) where the switch statement is required. Is

this something<br>> > that your changes will touch in any way?<br>> <br>> There will not be a difference. Let me explain in some details as

there<br>> seems to be a lot of confusion on this state machine topic:<br>> <br>> Now:<br>> <br>> Build a state machine in the user code (module) with all the parallel<br>> regions as explicit targets of the switch statement and a fallback<br>> default that does a indirect call to the requested parallel region.<br>> <br>> <br>> Proposed, after Clang:<br>> <br>> Use the runtime state machine implementation [0] which reduces the<br>> switch to the default case, thus an indirect call to the requested<br>> parallel region. This will always work, regardless of the translation<br>> unit that contained the parallel region (pointer).<br>> <br>> Proposed, after OpenMP-Opt pass in LLVM (assuming SPMD wasn't achieved):<br>> <br>> All reachable parallel regions in a kernel are collected and used

to<br>> create the switch statement in the user code (module) [1, line 111]

with<br>> a fallback if there are potentially [1, line 212] hidden parallel<br>> regions.<br>> <br>> <br>> Does that make sense?<br>> <br>> <br>> [0] </font></tt><a href="https://reviews.llvm.org/D57460#change-e9Ljd9RgdWYz"><tt><font size=2>https://reviews.llvm.org/D57460#change-e9Ljd9RgdWYz</font></tt></a><tt><font size=2><br>> [1] </font></tt><a href="https://reviews.llvm.org/D57460#change-8gnnGNfJVR4B"><tt><font size=2>https://reviews.llvm.org/D57460#change-8gnnGNfJVR4B</font></tt></a><tt><font size=2><br>> <br>> <br>> > My next question is, for the workloads which are in the same

compilation unit<br>> > there is a trick that code gen performs (or could perform I'm

not sure if<br>> this<br>> > has been upstreamed) where it can check for the specific name

of an outlined<br>> > function and then just call it directly thus making that function

inline-able<br>> > (thus erasing most if not all the overhead of having the state

machine in the<br>> > first place). In other words the "worst" part of the

switch statement will<br>> only<br>> > apply to outlined functions from other compilation units. With

this in mind<br>> > what would the impact of your changes be in the end? If this

part isn't clear<br>> I<br>> > can do some digging to find out how this actually works in more

details it's<br>> > been too long since I've had to look at this part.<br>> <br>> See the answer above.<br>> <br>> <br>> > Can you share some performance numbers given an example you have

been looking > > at? I see you have one that uses "#pragma omp atomic".

I would avoid using<br>> > something like that since it may have other overheads not related

to your<br>> > changes. I would put together an example with this directive

structure:<br>> ><br>> > #pragma omp target teams distribute<br>> > for(...){<br>> >   <code1><br>> >   #pragma omp parallel for<br>> >   for(...) {<br>> >     <code2><br>> >   }<br>> >   <code3><br>> > }<br>> ><br>> > which forces the use of the master-worker scheme (non-SPMD mode)

without any<br>> > other distractions.<br>> <br>> The atomic stuff I used to determine correctness. I haven't yet looked<br>> at performance. I will do so now and inform you on my results.<br>> <br>> <br>> > It would then be interesting to understand how you plan to change

the LLVM<br>> code<br>> > generated for this,<br>> <br>> The examples show how the LLVM-IR is supposed to look like, right?<br>> <br>> > what the overheads that you're targeting are (register usage,<br>> > synchronization cost etc), and then what the performance gain

is<br>> > compared to the current scheme.<br>> <br>> I can also compare register usage in addition to performance but there<br>> is no difference in synchronization. The number and (relative) order

of<br>> original runtime library calls stays the same. The number of user

code<br>> -> runtime library calls is even decreased.<br>> <br>> <br>> Please let me know if this helps and what questions remain.<br>> <br>> Thanks,<br>>  Johannes<br>> <br>> <br>> <br>> > From:        "Doerfert, Johannes"

<jdoerfert@anl.gov><br>> > To:        Alexey Bataev <a.bataev@outlook.com><br>> > Cc:        Guray Ozen <gozen@nvidia.com>,

Gheorghe-Teod Bercea<br>> > <gheorghe-teod.bercea@ibm.com>, "openmp-dev@lists.llvm.org"<br>> > <openmp-dev@lists.llvm.org>, "Finkel, Hal J."

<hfinkel@anl.gov>,<br>> > "Gregory.Rodgers@amd.com" <Gregory.Rodgers@amd.com>,

"kli@ca.ibm.com"<br>> > <kli@ca.ibm.com><br>> > Date:        01/30/2019 04:14 PM<br>> > Subject:        Re: [RFC] Late (OpenMP) GPU

code "SPMD-zation"<br>> >  $B(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,<br>> (,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,<br>> (,(,(, (B<br>> ><br>> ><br>> ><br>> > I don't really see "many ifs and maybes", actually

none.<br>> ><br>> > Anyway, I will now work on a patch set that adds the new functionality

under<br>> a<br>> > cmd flag<br>> > in order to showcase correctness and performance on real code.<br>> ><br>> > If you, or somebody else, have interesting examples, please feel

free to<br>> point<br>> > me at them.<br>> ><br>> > Thanks,<br>> >   Johannes<br>> ><br>> ><br>> >  $B(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,<br>> (,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,<br>> (,(,(, (B<br>> ><br>> > From: Alexey Bataev <a.bataev@outlook.com><br>> > Sent: Wednesday, January 30, 2019 2:18:19 PM<br>> > To: Doerfert, Johannes<br>> > Cc: Guray Ozen; Gheorghe-Teod Bercea; openmp-dev@lists.llvm.org;

Finkel, Hal > > J.; Gregory.Rodgers@amd.com; kli@ca.ibm.com > > Subject: Re: [RFC] Late (OpenMP) GPU code "SPMD-zation" > >   > > Currently, there are too many "if"s and "maybe"s.

If you can provide solution > > that does not break anything and does not affect the performance,

does not<br>> > require changes in the backend - then go ahead with the patches.<br>> ><br>> > -------------<br>> > Best regards,<br>> > Alexey Bataev<br>> > 30.01.2019 14:49, Doerfert, Johannes      :<br>> > No, SPMD mode will not be affected at all.<br>> ><br>> > The "worse" part is the following:<br>> >   If we inline runtime library calls before the openmp-opt

pass had a chance<br>> to<br>> > look at the code,<br>> >   we will not have a customized state machine for the __non-SPMD__

case. That<br>> > is, the if-cascade<br>> >   checking the work function pointer is not there.<br>> ><br>> > Avoiding this potential performance decline is actually very

easy. While we<br>> do<br>> > not have the "inline_late" capability,<br>> > run the openmp-opt pass __before__ the inliner and we will not

get "worse"<br>> > code. We might however miss out on<br>> > _new_ non-SPMD -> SPMD transformations.<br>> ><br>> ><br>> > Does that make sense?<br>> ><br>> >  $B(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,<br>> (,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,<br>> (,(,(, (B<br>> ><br>> > From: Alexey Bataev <a.bataev@outlook.com><br>> > Sent: Wednesday, January 30, 2019 1:44:10 PM<br>> > To: Doerfert, Johannes<br>> > Cc: Guray Ozen; Gheorghe-Teod Bercea; openmp-dev@lists.llvm.org;

Finkel, Hal > > J.; Gregory.Rodgers@amd.com; kli@ca.ibm.com > > Subject: Re: [RFC] Late (OpenMP) GPU code "SPMD-zation" > >   > > Any "worse" is not a good idea. We need to avoid it.

It would be good that<br>> the<br>> > new code did not affect the performance, especially for SPMD

mode (I think,<br>> > this "worse" will affect exactly SPMD mode, no?)<br>> ><br>> > -------------<br>> > Best regards,<br>> > Alexey Bataev<br>> > 30.01.2019 14:38, Doerfert, Johannes      :<br>> > The LLVM optimization (openmp-opt), which does non-SPMD ->

SPMD and custom<br>> > state machine generation, will not fire if<br>> > the __kernel_general_... calls are "missing". Thus

if we inline "to early",<br>> we<br>> > are "stuck" with the non-SPMD choice (not worse than<br>> > what we have now!) and the default library state machine ("worse"

than what<br>> we<br>> > have right now). Does that make sense?<br>> ><br>> > The second option described what I want to see us do "later"

in order to<br>> avoid<br>> > the above scenario and always get both,<br>> > openmp-opt and inlining of the runtime and work functions.<br>> ><br>> ><br>> >  $B(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,<br>> (,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,<br>> (,(,(, (B<br>> ><br>> > From: Alexey Bataev <a.bataev@outlook.com><br>> > Sent: Wednesday, January 30, 2019 1:25:42 PM<br>> > To: Doerfert, Johannes<br>> > Cc: Guray Ozen; Gheorghe-Teod Bercea; openmp-dev@lists.llvm.org;

Finkel, Hal<br>> > J.; Gregory.Rodgers@amd.com; kli@ca.ibm.com<br>> > Subject: Re: [RFC] Late (OpenMP) GPU code "SPMD-zation"<br>> >  <br>> > Sorry, did not understand your answer correctly. But you wrote:<br>> > for now, not doing the optimization is just fine.<br>> > What do you mean?<br>> ><br>> > -------------<br>> > Best regards,<br>> > Alexey Bataev<br>> > 30.01.2019 14:23, Doerfert, Johannes      :<br>> > Alexey,<br>> ><br>> > I'm not sure how to interpret "Bad idea!". but I think

there is again a<br>> > misunderstanding.<br>> > To help me understand, could you try to elaborate a bit?<br>> ><br>> > To make my last email clear:<br>> > I __do__ want inlining. Both answers to your earlier inlining

questions do<br>> > actually assume the runtime library calls __are eventually inlined__,<br>> > that is why I mentioned LTO and the runtime as bitcode.<br>> > .<br>> > Cheers,<br>> >   Johannes<br>> ><br>> ><br>> ><br>> >  $B(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,<br>> (,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,<br>> (,(,(, (B<br>> ><br>> > From: Alexey Bataev <a.bataev@outlook.com><br>> > Sent: Wednesday, January 30, 2019 1:14:56 PM<br>> > To: Doerfert, Johannes<br>> > Cc: Guray Ozen; Gheorghe-Teod Bercea; openmp-dev@lists.llvm.org;

Finkel, Hal<br>> > J.; Gregory.Rodgers@amd.com; kli@ca.ibm.com<br>> > Subject: Re: [RFC] Late (OpenMP) GPU code "SPMD-zation"<br>> >  <br>> > Bad idea!<br>> ><br>> > -------------<br>> > Best regards,<br>> > Alexey Bataev<br>> > 30.01.2019 14:11, Doerfert, Johannes      :<br>> > Sure I do. Why do you think I don't?<br>> ><br>> >  $B(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,<br>> (,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,<br>> (,(,(, (B<br>> ><br>> > From: Alexey Bataev <a.bataev@outlook.com><br>> > Sent: Wednesday, January 30, 2019 1:00:59 PM<br>> > To: Doerfert, Johannes<br>> > Cc: Guray Ozen; Gheorghe-Teod Bercea; openmp-dev@lists.llvm.org;

Finkel, Hal<br>> > J.; Gregory.Rodgers@amd.com; kli@ca.ibm.com<br>> > Subject: Re: [RFC] Late (OpenMP) GPU code "SPMD-zation"<br>> >  <br>> > You don't want to do the inlining?<br>> ><br>> > -------------<br>> > Best regards,<br>> > Alexey Bataev<br>> > 30.01.2019 13:59, Doerfert, Johannes      :<br>> > - for now, not doing the optimization is just fine. The whole

idea is that<br>> code<br>> > is always valid.<br>> ><br>> ><br>> <br>> --<br>> <br>> Johannes Doerfert<br>> Researcher<br>> <br>> Argonne National Laboratory<br>> Lemont, IL 60439, USA<br>> <br>> jdoerfert@anl.gov<br>> [attachment "signature.asc" deleted by Gheorghe-Teod Bercea/US/IBM]<br>> <br>> <br><br>-- <br><br>Johannes Doerfert<br>Researcher<br><br>Argonne National Laboratory<br>Lemont, IL 60439, USA<br><br>jdoerfert@anl.gov<br>[attachment "signature.asc" deleted by Gheorghe-Teod Bercea/US/IBM]

</font></tt><br><br><BR>