[LLVMdev] TSVC/Equivalencing-dbl

Fri Oct 5 13:26:33 PDT 2012

----- Original Message -----
> From: "Duncan Sands" <duncan.sands at gmail.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: llvmdev at cs.uiuc.edu
> Sent: Friday, October 5, 2012 2:50:06 PM
> Subject: Re: TSVC/Equivalencing-dbl
> 
> Hi Hal,
> 
> On 05/10/12 20:32, Hal Finkel wrote:
> > ----- Original Message -----
> >> From: "Duncan Sands" <duncan.sands at gmail.com>
> >> To: "Hal Finkel" <hfinkel at anl.gov>
> >> Cc: llvmdev at cs.uiuc.edu
> >> Sent: Friday, October 5, 2012 12:10:03 PM
> >> Subject: Re: TSVC/Equivalencing-dbl
> >>
> >> Oops, I ran the testsuite wrong: read clang output for dragonegg
> >> output.
> >
> > Okay, can you resummarize? Do you mean that?
> >
> > gcc -O0:
> > S1421         0.00                 16000
> >
> > gcc -O0 under valgrind:
> > S1421         0.00                 17208.404325315
> >
> > clang:
> > S1421    0.00           17208.404325315
> 
> exactly.  For "clang" this is only when building like the testsuite
> does
> (i.e. with link-time optimization + llc): if you directly do:
>    clang tsc.c dummy.c -std=gnu99 -O3
> then you get 16000.
> 
> >
> > This is all on Darwin, right?
> 
> No, this is on x86-64 (ubuntu) linux.

OIC, interesting!

> 
> >
> > I would certainly tend to suspect an 80-bit-intermediate issue,
> > but, both gcc and clang give 16000 on PowerPC (which has no
> > 80-bit).
> 
> Not sure what you are saying here.  The issue is the x86 internally
> uses 80 bits
> for the 64 bit (double) type, so as long as everything is in
> registers you get
> lots more precision, but the moment you store to memory only 64 bits
> are stored.
> The fact that gcc and clang give the same on powerpc confirms that it
> is coming
> from x86 using an extra 16 bits of precision beyond what you would
> expect.
> 
>   It could be a rounding issue, but would Darwin really have a
>   different default
> rounding mode?
> 
> As I'm seeing this on linux, I guess not :)
> 
> >
> > The computation being performed here is [in s1421() in tsc.inc]:
> >                  for (int i = 0; i < LEN/2; i++) {
> >                          b[i] = xx[i] + a[i];
> >                  }
> 
> 
> > So *if* we're adding up the same numbers in the same order, the
> > answer should be the same everywhere ;)
> 
> No, why would it be the same everywhere?  If the whole thing is done
> in
> double registers, and x86 processor will maintain 80 bits of
> precision
> even though these are 64 bit (double) types, while if things are
> loaded
> and stored to memory at every step instead then only 64 bits will be
> used.
> This can lead to very different results.

Right.

> 
>   Can you put in some print statements and confirm?
> 
> Not sure what you want me to confirm, but anyway I now have 1/2 an
> hour to
> look into this some more :)

For test s1421, we have:
                for (int i = 0; i < LEN/2; i++) {
                        b[i] = xx[i] + a[i];
                }

in this case xx is set to the second half of the b array. a is initialized to 1/(i+1)^2. The b array, however, does not seem to be explicitly initialized for this test. When all of the tests are run in order, it is initialized for the last test in the previous group, s353... so maybe I screwed this up in breaking apart the tests.

Thanks again,
Hal

> 
> Ciao, Duncan.
> 
> >
> > Thanks again,
> > Hal
> >
> >>
> >
> 
> 

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory