[LLVMdev] TSVC/Equivalencing-dbl

Fri Oct 5 12:50:06 PDT 2012

Hi Hal,

On 05/10/12 20:32, Hal Finkel wrote:
> ----- Original Message -----
>> From: "Duncan Sands" <duncan.sands at gmail.com>
>> To: "Hal Finkel" <hfinkel at anl.gov>
>> Cc: llvmdev at cs.uiuc.edu
>> Sent: Friday, October 5, 2012 12:10:03 PM
>> Subject: Re: TSVC/Equivalencing-dbl
>>
>> Oops, I ran the testsuite wrong: read clang output for dragonegg
>> output.
>
> Okay, can you resummarize? Do you mean that?
>
> gcc -O0:
> S1421         0.00                 16000
>
> gcc -O0 under valgrind:
> S1421         0.00                 17208.404325315
>
> clang:
> S1421    0.00           17208.404325315

exactly.  For "clang" this is only when building like the testsuite does
(i.e. with link-time optimization + llc): if you directly do:
   clang tsc.c dummy.c -std=gnu99 -O3
then you get 16000.

>
> This is all on Darwin, right?

No, this is on x86-64 (ubuntu) linux.

>
> I would certainly tend to suspect an 80-bit-intermediate issue, but, both gcc and clang give 16000 on PowerPC (which has no 80-bit).

Not sure what you are saying here.  The issue is the x86 internally uses 80 bits
for the 64 bit (double) type, so as long as everything is in registers you get
lots more precision, but the moment you store to memory only 64 bits are stored.
The fact that gcc and clang give the same on powerpc confirms that it is coming
from x86 using an extra 16 bits of precision beyond what you would expect.

  It could be a rounding issue, but would Darwin really have a different default 
rounding mode?

As I'm seeing this on linux, I guess not :)

>
> The computation being performed here is [in s1421() in tsc.inc]:
>                  for (int i = 0; i < LEN/2; i++) {
>                          b[i] = xx[i] + a[i];
>                  }

> So *if* we're adding up the same numbers in the same order, the answer should be the same everywhere ;)

No, why would it be the same everywhere?  If the whole thing is done in
double registers, and x86 processor will maintain 80 bits of precision
even though these are 64 bit (double) types, while if things are loaded
and stored to memory at every step instead then only 64 bits will be used.
This can lead to very different results.

  Can you put in some print statements and confirm?

Not sure what you want me to confirm, but anyway I now have 1/2 an hour to
look into this some more :)

Ciao, Duncan.

>
> Thanks again,
> Hal
>
>>
>