[PATCH] D135700: [DAGCombine] Simplify (truncate (build_pair x, y)) -> (truncate x) or x

Tue Oct 11 10:57:26 PDT 2022

kparzysz created this revision.
kparzysz added reviewers: RKSimon, arsenm, pengfei, craig.topper.
Herald added subscribers: kosarev, StephenFan, ecnelises, kerbowa, hiraditya, kristof.beyls, jvesely, qcolombet.
Herald added a project: All.
kparzysz requested review of this revision.
Herald added a subscriber: wdng.
Herald added a project: LLVM.

In the attached Hexagon test, the vector type `<32 x i64>` gets scalarized into a number of extracts and build_pairs.  It is subsequently not optimized further, and instead of a single HVX instruction, we end up with a lot of scalar code.

Without this patch, before type legalization we have:

  Optimized lowered selection DAG: %bb.0 'f0:b0'
  SelectionDAG has 15 nodes:
    t0: ch = EntryToken
                t2: v32i32,ch = CopyFromReg t0, Register:v32i32 %0
                t19: v32i32 = splat_vector Constant:i32<1>
              t18: v32i32 = shl t2, t19
            t7: v64i32 = concat_vectors t18, undef:v32i32
          t9: v64i32 = vector_shuffle<0,u,1,u,2,u,3,u,4,u,5,u,6,u,7,u,8,u,9,u,10,u,11,u,12,u,13,u,14,u,15,u,16,u,17,u,18,u,19,u,20,u,21,u,22,u,23,u,24,u,25,u,26,u,27,u,28,u,29,u,30,u,31,u> t7, undef:v64i32
        t10: v32i64 = bitcast t9
      t11: v32i32 = truncate t10
    t13: ch,glue = CopyToReg t0, Register:v32i32 $v0, t11
    t14: ch = HexagonISD::RET_FLAG t13, Register:v32i32 $v0, t13:1

After

  Type-legalized selection DAG: %bb.0 'f0:b0'
  SelectionDAG has 205 nodes:
    t0: ch = EntryToken
          t2: v32i32,ch = CopyFromReg t0, Register:v32i32 %0
          t19: v32i32 = splat_vector Constant:i32<1>
        t18: v32i32 = shl t2, t19
      t7: v64i32 = concat_vectors t18, undef:v32i32
    t9: v64i32 = vector_shuffle<0,u,1,u,2,u,3,u,4,u,5,u,6,u,7,u,8,u,9,u,10,u,11,u,12,u,13,u,14,u,15,u,16,u,17,u,18,u,19,u,20,u,21,u,22,u,23,u,24,u,25,u,26,u,27,u,28,u,29,u,30,u,31,u> t7, undef:v64i32
            t31: i32 = extract_vector_elt t9, Constant:i32<0>
            t32: i32 = extract_vector_elt t9, Constant:i32<1>
          t157: i64 = build_pair t31, t32
        t227: i32 = truncate t157
            t34: i32 = extract_vector_elt t9, Constant:i32<2>
            t36: i32 = extract_vector_elt t9, Constant:i32<3>
          t158: i64 = build_pair t34, t36
        t229: i32 = truncate t158
            t38: i32 = extract_vector_elt t9, Constant:i32<4>
            t40: i32 = extract_vector_elt t9, Constant:i32<5>
          t159: i64 = build_pair t38, t40
        t231: i32 = truncate t159
            t42: i32 = extract_vector_elt t9, Constant:i32<6>
            t44: i32 = extract_vector_elt t9, Constant:i32<7>
          t160: i64 = build_pair t42, t44
        t233: i32 = truncate t160
  [...]

This should be a harmless change, but there is one degradation on x86: llvm/test/CodeGen/X86/test-shrink.ll

The DAG before instruction selection:

  SelectionDAG has 14 nodes:
    t0: ch = EntryToken
              t41: i32,ch = load<(load (s16) from %fixed-stack.1, align 4), zext from i16> t0, FrameIndex:i32<-1>, undef:i32
            t40: i32 = and t41, Constant:i32<32768>
          t38: i16 = truncate t40
        t32: i32 = X86ISD::CMP t38, Constant:i16<0>
      t34: ch = X86ISD::BRCOND t0, BasicBlock:ch<no 0x82ec648>, TargetConstant:i8<8>, t32
    t18: ch = br t34, BasicBlock:ch<yes 0x82ec550>

The `i32 = load (s16)` does not match the pattern that generates `AND32rm`, and we generate two instructions instead of one.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D135700

Files:
  llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
  llvm/test/CodeGen/AMDGPU/llvm.mulo.ll
  llvm/test/CodeGen/AMDGPU/mad_64_32.ll
  llvm/test/CodeGen/AMDGPU/select-undef.ll
  llvm/test/CodeGen/AMDGPU/shift-i128.ll
  llvm/test/CodeGen/AMDGPU/store-weird-sizes.ll
  llvm/test/CodeGen/AMDGPU/wwm-reserved-spill.ll
  llvm/test/CodeGen/Hexagon/isel-simplify-trunc-buildpair.ll
  llvm/test/CodeGen/X86/64-bit-shift-by-32-minus-y.ll
  llvm/test/CodeGen/X86/combine-bswap.ll
  llvm/test/CodeGen/X86/pr49451.ll
  llvm/test/CodeGen/X86/test-shrink.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D135700.466861.patch
Type: text/x-patch
Size: 24364 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20221011/f7018e14/attachment.bin>