[PATCH] Fix bug in x86's peephole optimization

Thu Sep 11 10:54:23 PDT 2014

I found a bug in X86InstrInfo::foldMemoryOperandImpl which results in
incorrectly folding a zero-extending 64-bit load from the stack into an
addpd, which is a SIMD add of two packed double precision floating point
values.

The machine IR looks like this before peephole optimization:

%vreg21<def> = MOVSDrm <fi#0>, 1, %noreg, 0, %noreg;
mem:LD8[%7](align=16)(tbaa=<badref>) VR128:%vreg21
%vreg23<def,tied1> = ADDPDrr %vreg20<tied0>, %vreg21;
VR128:%vreg23,%vreg20,%vreg21

After the optimization, MOVSDrm is folded into ADDPDrm:

%vreg23<def,tied1> = ADDPDrm %vreg20<tied0>, <fi#0>, 1, %noreg, 0, %noreg;
mem:LD8[%7](align=16)(tbaa=<badref>) VR128:%vreg23,%vreg20

ADDPDrm loads a 128-bit value from the memory and adds it to %vreg20.

X86InstrInfo::foldMemoryOperandImpl already has the logic that prevents
this from happening (near line 4544), however the check isn't conducted for
loads from stack objects. The attached patch factors out this logic into a
new function and uses it for checking loads from stack slots are not
zero-extending loads.

This fixes rdar://problem/18236850.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140911/da238ebf/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: x86fold-movsd1.patch
Type: application/octet-stream
Size: 4232 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140911/da238ebf/attachment.obj>