<div dir="ltr">I thought about this more, and I think it is incorrect in general (depending on the rounding mode) to replace the following two instructions with addsd.<div><div class="gmail_extra"><span style="font-family:arial,sans-serif;font-size:13px"><br></span></div><div class="gmail_extra"><div class="gmail_extra"><span style="font-family:arial,sans-serif;font-size:13px">movsd (%rsp), %xmm0</span><br></div><div class="gmail_extra"><span style="font-family:arial,sans-serif;font-size:13px">addpd  %xmm0, %xmm1</span><br style="font-family:arial,sans-serif;font-size:13px"></div><div class="gmail_extra"><span style="font-family:arial,sans-serif;font-size:13px"><br></span></div></div><div class="gmail_extra"><div class="gmail_extra"><font face="arial, sans-serif">If the higher part of %xmm1 is -0.0, the result of the addpd will be </font><font face="arial, sans-serif">(0.0, </font><span style="font-family:arial,sans-serif">%xmm1_low + %xmm0_low</span><span style="font-family:arial,sans-serif">), since -0.0 + 0.0 = 0.0. However, i</span><span style="font-size:13px;font-family:arial,sans-serif">f we use </span><font face="arial, sans-serif">addsd, the contents of %xmm1 will be </font><font face="arial, sans-serif">(-0.0, </font><span style="font-family:arial,sans-serif">%xmm1_low + %xmm0_low</span><span style="font-family:arial,sans-serif">).</span></div><div class="gmail_extra"><br></div><div class="gmail_extra">We can check the rounding mode to see if this optimization is legal, but I'm not sure if it's worthwhile.</div><div class="gmail_extra"><br></div></div><div class="gmail_extra"><div class="gmail_quote">On Fri, Sep 12, 2014 at 7:36 AM, Akira Hatanaka <span dir="ltr"><<a href="mailto:ahatanak@gmail.com" target="_blank">ahatanak@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr">Hi Andrea,<div><br></div><div>I agree with your analysis.</div><div><br></div><div>I think I can add an optimization in x86 custom DAGCombine. If I transform (add buildvector(a,b), buildvector(c,0)) into (buildvector(add(a,c),b)), code-gen emits ADDSDrm instead of ADDPDrm. If this optimization is profitable for other targets too, I can probably move it to targert-independent DAGCombine or implement it in InstCombine.</div><div><br></div><div>We still need the check in X86InstrInfo to make sure we aren't incorrectly folding MOVSDrm into ADDPDrr. </div></div><div class=""><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Sep 11, 2014 at 2:08 PM, Andrea Di Biagio <span dir="ltr"><<a href="mailto:andrea.dibiagio@gmail.com" target="_blank">andrea.dibiagio@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Hi Akira,<br>

<br>

I agree that it is wrong to fold the sequence (MOVSDrm, ADDPDrr) into<br>

ADDPDrm because MOVSDrm would zero the upper 64-bits in the<br>

destination register.<br>

However, I think it would be valid to fold the sequence (MOVSDrm,<br>

ADDPDrr) into a single ADDSDrm. Both MOVSDrm and ADDSDrm are subject<br>

to the same alignment restrictions for the memory operand; therefore,<br>

in your example, it should be safe to replace MOVSDrm with ADDSDrm.<br>

<br>

  Example:<br>

     movsd (%rsp), %xmm0<br>

     addpd  %xmm0, %xmm1<br>

 --><br>

     addsd  (%rsp), %xmm1<br>

<br>

The same can be done for all SSE/AVX packed fp arithmetic instructions<br>

(example: movss/d + subps/d --> subss/d; etc.).<br>

<br>

So, to me the problem in the example is that we are using the wrong<br>

opcode for the folded instruction. By disabling the folding of a<br>

zero-extending load we would fix the problem with the miscompile,<br>

however we would also miss an opportunity to further simplify the<br>

code.<br>

If you agree with this analysis, do you think it would be possible to<br>

modify the current logic to emit the correct opcode? (example: ADDSDrm<br>

instead of ADDPDrm).<br>

<br>

I hope this makes sense (I am not very familiar with method<br>

'foldMemoryOperandImpl'..).<br>

<br>

Thanks,<br>

Andrea<br>

<div><div><br>

On Thu, Sep 11, 2014 at 6:54 PM, Akira Hatanaka <<a href="mailto:ahatanak@gmail.com" target="_blank">ahatanak@gmail.com</a>> wrote:<br>

> I found a bug in X86InstrInfo::foldMemoryOperandImpl which results in<br>

> incorrectly folding a zero-extending 64-bit load from the stack into an<br>

> addpd, which is a SIMD add of two packed double precision floating point<br>

> values.<br>

><br>

> The machine IR looks like this before peephole optimization:<br>

><br>

> %vreg21<def> = MOVSDrm <fi#0>, 1, %noreg, 0, %noreg;<br>

> mem:LD8[%7](align=16)(tbaa=<badref>) VR128:%vreg21<br>

> %vreg23<def,tied1> = ADDPDrr %vreg20<tied0>, %vreg21;<br>

> VR128:%vreg23,%vreg20,%vreg21<br>

><br>

> After the optimization, MOVSDrm is folded into ADDPDrm:<br>

><br>

> %vreg23<def,tied1> = ADDPDrm %vreg20<tied0>, <fi#0>, 1, %noreg, 0, %noreg;<br>

> mem:LD8[%7](align=16)(tbaa=<badref>) VR128:%vreg23,%vreg20<br>

><br>

> ADDPDrm loads a 128-bit value from the memory and adds it to %vreg20.<br>

><br>

> X86InstrInfo::foldMemoryOperandImpl already has the logic that prevents this<br>

> from happening (near line 4544), however the check isn't conducted for loads<br>

> from stack objects. The attached patch factors out this logic into a new<br>

> function and uses it for checking loads from stack slots are not<br>

> zero-extending loads.<br>

><br>

> This fixes rdar://problem/18236850.<br>

><br>

</div></div>> _______________________________________________<br>

> llvm-commits mailing list<br>

> <a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a><br>

> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>

><br>

</blockquote></div><br></div>

</div></div></blockquote></div><br></div></div></div>