[PATCH] Fix bug in x86's peephole optimization

Thu Sep 11 14:08:51 PDT 2014

Hi Akira,

I agree that it is wrong to fold the sequence (MOVSDrm, ADDPDrr) into
ADDPDrm because MOVSDrm would zero the upper 64-bits in the
destination register.
However, I think it would be valid to fold the sequence (MOVSDrm,
ADDPDrr) into a single ADDSDrm. Both MOVSDrm and ADDSDrm are subject
to the same alignment restrictions for the memory operand; therefore,
in your example, it should be safe to replace MOVSDrm with ADDSDrm.

  Example:
     movsd (%rsp), %xmm0
     addpd  %xmm0, %xmm1
 -->
     addsd  (%rsp), %xmm1

The same can be done for all SSE/AVX packed fp arithmetic instructions
(example: movss/d + subps/d --> subss/d; etc.).

So, to me the problem in the example is that we are using the wrong
opcode for the folded instruction. By disabling the folding of a
zero-extending load we would fix the problem with the miscompile,
however we would also miss an opportunity to further simplify the
code.
If you agree with this analysis, do you think it would be possible to
modify the current logic to emit the correct opcode? (example: ADDSDrm
instead of ADDPDrm).

I hope this makes sense (I am not very familiar with method
'foldMemoryOperandImpl'..).

Thanks,
Andrea

On Thu, Sep 11, 2014 at 6:54 PM, Akira Hatanaka <ahatanak at gmail.com> wrote:
> I found a bug in X86InstrInfo::foldMemoryOperandImpl which results in
> incorrectly folding a zero-extending 64-bit load from the stack into an
> addpd, which is a SIMD add of two packed double precision floating point
> values.
>
> The machine IR looks like this before peephole optimization:
>
> %vreg21<def> = MOVSDrm <fi#0>, 1, %noreg, 0, %noreg;
> mem:LD8[%7](align=16)(tbaa=<badref>) VR128:%vreg21
> %vreg23<def,tied1> = ADDPDrr %vreg20<tied0>, %vreg21;
> VR128:%vreg23,%vreg20,%vreg21
>
> After the optimization, MOVSDrm is folded into ADDPDrm:
>
> %vreg23<def,tied1> = ADDPDrm %vreg20<tied0>, <fi#0>, 1, %noreg, 0, %noreg;
> mem:LD8[%7](align=16)(tbaa=<badref>) VR128:%vreg23,%vreg20
>
> ADDPDrm loads a 128-bit value from the memory and adds it to %vreg20.
>
> X86InstrInfo::foldMemoryOperandImpl already has the logic that prevents this
> from happening (near line 4544), however the check isn't conducted for loads
> from stack objects. The attached patch factors out this logic into a new
> function and uses it for checking loads from stack slots are not
> zero-extending loads.
>
> This fixes rdar://problem/18236850.
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>