[PATCH] Fix bug in x86's peephole optimization

Fri Sep 12 07:36:56 PDT 2014

Hi Andrea,

I agree with your analysis.

I think I can add an optimization in x86 custom DAGCombine. If I transform
(add buildvector(a,b), buildvector(c,0)) into (buildvector(add(a,c),b)),
code-gen emits ADDSDrm instead of ADDPDrm. If this optimization is
profitable for other targets too, I can probably move it to
targert-independent DAGCombine or implement it in InstCombine.

We still need the check in X86InstrInfo to make sure we aren't incorrectly
folding MOVSDrm into ADDPDrr.

On Thu, Sep 11, 2014 at 2:08 PM, Andrea Di Biagio <andrea.dibiagio at gmail.com
> wrote:

> Hi Akira,
>
> I agree that it is wrong to fold the sequence (MOVSDrm, ADDPDrr) into
> ADDPDrm because MOVSDrm would zero the upper 64-bits in the
> destination register.
> However, I think it would be valid to fold the sequence (MOVSDrm,
> ADDPDrr) into a single ADDSDrm. Both MOVSDrm and ADDSDrm are subject
> to the same alignment restrictions for the memory operand; therefore,
> in your example, it should be safe to replace MOVSDrm with ADDSDrm.
>
>   Example:
>      movsd (%rsp), %xmm0
>      addpd  %xmm0, %xmm1
>  -->
>      addsd  (%rsp), %xmm1
>
> The same can be done for all SSE/AVX packed fp arithmetic instructions
> (example: movss/d + subps/d --> subss/d; etc.).
>
> So, to me the problem in the example is that we are using the wrong
> opcode for the folded instruction. By disabling the folding of a
> zero-extending load we would fix the problem with the miscompile,
> however we would also miss an opportunity to further simplify the
> code.
> If you agree with this analysis, do you think it would be possible to
> modify the current logic to emit the correct opcode? (example: ADDSDrm
> instead of ADDPDrm).
>
> I hope this makes sense (I am not very familiar with method
> 'foldMemoryOperandImpl'..).
>
> Thanks,
> Andrea
>
> On Thu, Sep 11, 2014 at 6:54 PM, Akira Hatanaka <ahatanak at gmail.com>
> wrote:
> > I found a bug in X86InstrInfo::foldMemoryOperandImpl which results in
> > incorrectly folding a zero-extending 64-bit load from the stack into an
> > addpd, which is a SIMD add of two packed double precision floating point
> > values.
> >
> > The machine IR looks like this before peephole optimization:
> >
> > %vreg21<def> = MOVSDrm <fi#0>, 1, %noreg, 0, %noreg;
> > mem:LD8[%7](align=16)(tbaa=<badref>) VR128:%vreg21
> > %vreg23<def,tied1> = ADDPDrr %vreg20<tied0>, %vreg21;
> > VR128:%vreg23,%vreg20,%vreg21
> >
> > After the optimization, MOVSDrm is folded into ADDPDrm:
> >
> > %vreg23<def,tied1> = ADDPDrm %vreg20<tied0>, <fi#0>, 1, %noreg, 0,
> %noreg;
> > mem:LD8[%7](align=16)(tbaa=<badref>) VR128:%vreg23,%vreg20
> >
> > ADDPDrm loads a 128-bit value from the memory and adds it to %vreg20.
> >
> > X86InstrInfo::foldMemoryOperandImpl already has the logic that prevents
> this
> > from happening (near line 4544), however the check isn't conducted for
> loads
> > from stack objects. The attached patch factors out this logic into a new
> > function and uses it for checking loads from stack slots are not
> > zero-extending loads.
> >
> > This fixes rdar://problem/18236850.
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140912/47e3fc50/attachment.html>