[llvm-bugs] [Bug 34883] New: [LoopDataPrefetch] - places prefetches between a load and its single user, which disrupts instruction selection.

via llvm-bugs llvm-bugs at lists.llvm.org
Mon Oct 9 07:42:22 PDT 2017


https://bugs.llvm.org/show_bug.cgi?id=34883

            Bug ID: 34883
           Summary: [LoopDataPrefetch]  - places prefetches between a load
                    and its single user, which disrupts instruction
                    selection.
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Scalar Optimizations
          Assignee: unassignedbugs at nondot.org
          Reporter: paulsson at linux.vnet.ibm.com
                CC: llvm-bugs at lists.llvm.org

Created attachment 19250
  --> https://bugs.llvm.org/attachment.cgi?id=19250&action=edit
reduced testcase

On SystemZ, it is good to utilize the vector load element instruction whenever
possible, which can load from memory and insert into a vector element with a
single instruction.

In this test case, there are four loads followed by four insertelement
instructions that together load a vector with four 32bit elements. Without the
LoopDataPrefetch pass, this is selected into VLEFs (the first one is a VLREP
though, as expected), but with the prefetches this does not happen.

  %15 = load i32, i32* %11, align 4, !tbaa !1
  %16 = load i32, i32* %12, align 4, !tbaa !1
  %17 = load i32, i32* %13, align 4, !tbaa !1
  %18 = load i32, i32* %14, align 4, !tbaa !1
  %19 = insertelement <4 x i32> undef, i32 %15, i32 0
  %20 = insertelement <4 x i32> %19, i32 %16, i32 1
  %21 = insertelement <4 x i32> %20, i32 %17, i32 2
  %22 = insertelement <4 x i32> %21, i32 %18, i32 3

=> LoopDataPrefetch pass

  call void @llvm.prefetch(i8* %scevgep1, i32 0, i32 3, i32 1)
  %23 = load i32, i32* %19, align 4, !tbaa !1
  call void @llvm.prefetch(i8* %scevgep23, i32 0, i32 3, i32 1)
  %24 = load i32, i32* %20, align 4, !tbaa !1
  call void @llvm.prefetch(i8* %scevgep45, i32 0, i32 3, i32 1)
  %25 = load i32, i32* %21, align 4, !tbaa !1
  call void @llvm.prefetch(i8* %scevgep67, i32 0, i32 3, i32 1)
  %26 = load i32, i32* %22, align 4, !tbaa !1
  %27 = insertelement <4 x i32> undef, i32 %23, i32 0
  %28 = insertelement <4 x i32> %27, i32 %24, i32 1
  %29 = insertelement <4 x i32> %28, i32 %25, i32 2
  %30 = insertelement <4 x i32> %29, i32 %26, i32 3

It seems that the prefetches are placed before each load, but this is not good
enough in this case as this is a sequence of several loads.

The DAG then looks like:

Optimized legalized selection DAG: BB#1 'BZ2_blockSort:vector.body210'
SelectionDAG has 79 nodes:
  t0: ch = EntryToken
  t2: i64,ch = CopyFromReg t0, Register:i64 %vreg1
      t10: i64 = add t2, Constant:i64<163840>
    t116: ch = SystemZISD::PREFETCH<LD1[%scevgep13]> t0, Constant:i32<1>, t10
  t23: i32,ch = load<LD4[%lsr.iv](tbaa=<0x52db148>)> t116, t2, undef:i64
      t8: i64 = add t2, Constant:i64<164864>
    t115: ch = SystemZISD::PREFETCH<LD1[%scevgep12]> t23:1, Constant:i32<1>, t8
    t12: i64 = add t2, Constant:i64<1024>
  t25: i32,ch = load<LD4[%scevgep19](tbaa=<0x52db148>)> t115, t12, undef:i64
      t6: i64 = add t2, Constant:i64<165888>
    t114: ch = SystemZISD::PREFETCH<LD1[%scevgep11]> t25:1, Constant:i32<1>, t6
    t14: i64 = add t2, Constant:i64<2048>
  t27: i32,ch = load<LD4[%scevgep17](tbaa=<0x52db148>)> t114, t14, undef:i64
    t45: i64 = add t2, Constant:i64<4>
  t96: i32,ch = load<LD4[%scevgep20](tbaa=<0x52db148>)> t113, t45, undef:i64
    t47: i64 = add t2, Constant:i64<1028>
  t93: i32,ch = load<LD4[%scevgep18](tbaa=<0x52db148>)> t113, t47, undef:i64
    t49: i64 = add t2, Constant:i64<2052>
  t90: i32,ch = load<LD4[%scevgep16](tbaa=<0x52db148>)> t113, t49, undef:i64
    t51: i64 = add t2, Constant:i64<3076>
  t87: i32,ch = load<LD4[%scevgep14](tbaa=<0x52db148>)> t113, t51, undef:i64
    t16: i64 = add t2, Constant:i64<3072>
  t29: i32,ch = load<LD4[%scevgep15](tbaa=<0x52db148>)> t113, t16, undef:i64
    t4: i64 = add t2, Constant:i64<166912>
  t113: ch = SystemZISD::PREFETCH<LD1[%scevgep10]> t27:1, Constant:i32<1>, t4
  t122: v4i32 = SystemZISD::ROTATE_MASK Constant:i32<11>, Constant:i32<9>
          t40: i64,ch = CopyFromReg t0, Register:i64 %vreg2
        t66: i64 = add t40, Constant:i64<4>
      t68: ch = CopyToReg t0, Register:i64 %vreg3, t66
        t70: i64 = add t2, Constant:i64<4096>
      t72: ch = CopyToReg t0, Register:i64 %vreg4, t70
          t74: i64,ch = CopyFromReg t0, Register:i64 %vreg0
        t76: i64 = add t74, Constant:i64<-4>
      t78: ch = CopyToReg t0, Register:i64 %vreg5, t76
                    t104: v4i32 = SystemZISD::REPLICATE t23
                  t105: v4i32 = insert_vector_elt t104, t25, Constant:i32<1>
                t107: v4i32 = insert_vector_elt t105, t27, Constant:i32<2>
              t108: v4i32 = insert_vector_elt t107, t29, Constant:i32<3>
            t38: v4i32 = and t108, t122
          t43: ch = store<ST16[undef](align=4)(tbaa=<0x52db148>)> t29:1, t38,
undef:i64, undef:i64
        t98: ch = TokenFactor t87:1, t90:1, t93:1, t43, t96:1
                  t109: v4i32 = SystemZISD::REPLICATE t96
                t110: v4i32 = insert_vector_elt t109, t93, Constant:i32<1>
              t111: v4i32 = insert_vector_elt t110, t90, Constant:i32<2>
            t112: v4i32 = insert_vector_elt t111, t87, Constant:i32<3>
          t60: v4i32 = and t112, t122
            t118: v16i8 = SystemZISD::BYTE_MASK Constant:i32<65535>
          t119: v4i32 = bitcast t118
        t63: v4i32 = add t60, t119
      t65: ch = store<ST16[undef](align=4)(tbaa=<0x52db148>)> t98, t63,
undef:i64, undef:i64
    t80: ch = TokenFactor t68, t72, t78, t65
  t81: ch = br t80, BasicBlock:ch<vector.body210 0x53366e8>

It seems that the pattern matcher for VLEF fails because, each prefetch node is
chained between the loads for the vector elements. Without the prefetch nodes,
the loads are not chained and the pattern matcher succeeds.

llc -mtriple=s390x-linux-gnu -mcpu=z13 tc_pfd.ll

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20171009/7e771a6f/attachment-0001.html>


More information about the llvm-bugs mailing list