[PATCH] D38547: [X86] Fix chains update when lowering BUILD_VECTOR to a vector load

Wed Oct 4 09:44:35 PDT 2017

apilipenko created this revision.

The code which lowers BUILD_VECTOR of consecutive loads into a single vector load doesn't update chains properly. As a result the vector load can be reordered with the store to the same location.

Consider the test case.

  define <4 x i32> @merge_4i32_i32_23u5_inc3(i32* %ptr) nounwind uwtable noinline ssp {
    %ptr0 = getelementptr inbounds i32, i32* %ptr, i64 2
    %ptr1 = getelementptr inbounds i32, i32* %ptr, i64 3
    %ptr3 = getelementptr inbounds i32, i32* %ptr, i64 5
    %val0 = load i32, i32* %ptr0
    %val1 = load i32, i32* %ptr1
    %inc = add i32 %val1, 1
    store i32 %inc, i32* %ptr1
    %val3 = load i32, i32* %ptr3
    %res0 = insertelement <4 x i32> undef, i32 %val0, i32 0
    %res1 = insertelement <4 x i32> %res0, i32 %val1, i32 1
    %res3 = insertelement <4 x i32> %res1, i32 %val3, i32 3
    ret <4 x i32> %res3
  }

llc -mtriple=x86_64-unknown-unknown -mattr=+sse2 -debug-only=isel

Type-legalized selection DAG: BB#0 'merge_4i32_i32_23u5_inc3:'

  SelectionDAG has 22 nodes:
    t0: ch = EntryToken
    t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
    t6: i64 = add t2, Constant:i64<12>
    t11: i32,ch = load<LD4[%ptr1]> t0, t6, undef:i64
        t13: i32 = add t11, Constant:i32<1>
      t14: ch = store<ST4[%ptr1]> t11:1, t13, t6, undef:i64
          t4: i64 = add t2, Constant:i64<8>
        t35: i32,ch = load<LD4[%ptr0]> t0, t4, undef:i64
          t8: i64 = add t2, Constant:i64<20>
        t33: i32,ch = load<LD4[%ptr3]> t0, t8, undef:i64
      t32: v4i32 = BUILD_VECTOR t35, t11, undef:i32, t33
    t27: ch,glue = CopyToReg t14, Register:v4i32 %XMM0, t32
    t28: ch = X86ISD::RET_FLAG t27, TargetConstant:i32<0>, Register:v4i32 %XMM0, t27:1

Legalized selection DAG: BB#0 'merge_4i32_i32_23u5_inc3:'

  SelectionDAG has 18 nodes:
    t0: ch = EntryToken
    t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
    t6: i64 = add t2, Constant:i64<12>
    t11: i32,ch = load<LD4[%ptr1]> t0, t6, undef:i64
        t13: i32 = add t11, Constant:i32<1>
      t14: ch = store<ST4[%ptr1]> t11:1, t13, t6, undef:i64
          t4: i64 = add t2, Constant:i64<8>
        t38: v2i64,ch = load<LD16[%ptr0](align=4)> t0, t4, undef:i64
      t39: v4i32 = bitcast t38
    t27: ch,glue = CopyToReg t14, Register:v4i32 %XMM0, t39
    t28: ch = X86ISD::RET_FLAG t27, TargetConstant:i32<0>, Register:v4i32 %XMM0, t27:1

In the last DAG there are two chains:
t0 - t11 (scalar load) - t14 (scalar store)
t0 - t38 (vector load)

There is no ordering constraint between t38 (vector load) and t14 (scalar store), so they can be reordered. That's what happens in the final assembly:

  # BB#0:
    incl  12(%rdi)
    movups  8(%rdi), %xmm0
    retq

Here the scalar store happens before the vector load. The vector load reloads the incremented value for 12(%rdi) location, while in the original program it uses the pre-increment value.

The current code in EltsFromConsecutiveLoads tries to update the chains but fails becuase it only updates the chain following the first load. Had the incremented location been the first load we wouldn't miscompile. The fix is to update the chains following all the loads comprising the vector.

https://reviews.llvm.org/D38547

Files:
  lib/Target/X86/X86ISelLowering.cpp
  test/CodeGen/X86/merge-consecutive-loads-128.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D38547.117684.patch
Type: text/x-patch
Size: 10204 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20171004/37328541/attachment.bin>