[PATCH] Fix bug in x86's peephole optimization

Akira Hatanaka ahatanak at gmail.com
Fri Sep 12 13:41:13 PDT 2014


I thought about this more, and I think it is incorrect in general
(depending on the rounding mode) to replace the following two instructions
with addsd.

movsd (%rsp), %xmm0
addpd  %xmm0, %xmm1

If the higher part of %xmm1 is -0.0, the result of the addpd will be
(0.0, %xmm1_low
+ %xmm0_low), since -0.0 + 0.0 = 0.0. However, if we use addsd, the
contents of %xmm1 will be (-0.0, %xmm1_low + %xmm0_low).

We can check the rounding mode to see if this optimization is legal, but
I'm not sure if it's worthwhile.

On Fri, Sep 12, 2014 at 7:36 AM, Akira Hatanaka <ahatanak at gmail.com> wrote:

> Hi Andrea,
>
> I agree with your analysis.
>
> I think I can add an optimization in x86 custom DAGCombine. If I transform
> (add buildvector(a,b), buildvector(c,0)) into (buildvector(add(a,c),b)),
> code-gen emits ADDSDrm instead of ADDPDrm. If this optimization is
> profitable for other targets too, I can probably move it to
> targert-independent DAGCombine or implement it in InstCombine.
>
> We still need the check in X86InstrInfo to make sure we aren't incorrectly
> folding MOVSDrm into ADDPDrr.
>
> On Thu, Sep 11, 2014 at 2:08 PM, Andrea Di Biagio <
> andrea.dibiagio at gmail.com> wrote:
>
>> Hi Akira,
>>
>> I agree that it is wrong to fold the sequence (MOVSDrm, ADDPDrr) into
>> ADDPDrm because MOVSDrm would zero the upper 64-bits in the
>> destination register.
>> However, I think it would be valid to fold the sequence (MOVSDrm,
>> ADDPDrr) into a single ADDSDrm. Both MOVSDrm and ADDSDrm are subject
>> to the same alignment restrictions for the memory operand; therefore,
>> in your example, it should be safe to replace MOVSDrm with ADDSDrm.
>>
>>   Example:
>>      movsd (%rsp), %xmm0
>>      addpd  %xmm0, %xmm1
>>  -->
>>      addsd  (%rsp), %xmm1
>>
>> The same can be done for all SSE/AVX packed fp arithmetic instructions
>> (example: movss/d + subps/d --> subss/d; etc.).
>>
>> So, to me the problem in the example is that we are using the wrong
>> opcode for the folded instruction. By disabling the folding of a
>> zero-extending load we would fix the problem with the miscompile,
>> however we would also miss an opportunity to further simplify the
>> code.
>> If you agree with this analysis, do you think it would be possible to
>> modify the current logic to emit the correct opcode? (example: ADDSDrm
>> instead of ADDPDrm).
>>
>> I hope this makes sense (I am not very familiar with method
>> 'foldMemoryOperandImpl'..).
>>
>> Thanks,
>> Andrea
>>
>> On Thu, Sep 11, 2014 at 6:54 PM, Akira Hatanaka <ahatanak at gmail.com>
>> wrote:
>> > I found a bug in X86InstrInfo::foldMemoryOperandImpl which results in
>> > incorrectly folding a zero-extending 64-bit load from the stack into an
>> > addpd, which is a SIMD add of two packed double precision floating point
>> > values.
>> >
>> > The machine IR looks like this before peephole optimization:
>> >
>> > %vreg21<def> = MOVSDrm <fi#0>, 1, %noreg, 0, %noreg;
>> > mem:LD8[%7](align=16)(tbaa=<badref>) VR128:%vreg21
>> > %vreg23<def,tied1> = ADDPDrr %vreg20<tied0>, %vreg21;
>> > VR128:%vreg23,%vreg20,%vreg21
>> >
>> > After the optimization, MOVSDrm is folded into ADDPDrm:
>> >
>> > %vreg23<def,tied1> = ADDPDrm %vreg20<tied0>, <fi#0>, 1, %noreg, 0,
>> %noreg;
>> > mem:LD8[%7](align=16)(tbaa=<badref>) VR128:%vreg23,%vreg20
>> >
>> > ADDPDrm loads a 128-bit value from the memory and adds it to %vreg20.
>> >
>> > X86InstrInfo::foldMemoryOperandImpl already has the logic that prevents
>> this
>> > from happening (near line 4544), however the check isn't conducted for
>> loads
>> > from stack objects. The attached patch factors out this logic into a new
>> > function and uses it for checking loads from stack slots are not
>> > zero-extending loads.
>> >
>> > This fixes rdar://problem/18236850.
>> >
>> > _______________________________________________
>> > llvm-commits mailing list
>> > llvm-commits at cs.uiuc.edu
>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140912/39c284d2/attachment.html>


More information about the llvm-commits mailing list