[PATCH] D79360: [DAGCombiner] sink target-supported cast op after concat vectors

Sanjay Patel via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue May 5 05:53:01 PDT 2020


spatel marked an inline comment as done.
spatel added inline comments.


================
Comment at: llvm/test/CodeGen/X86/avx-shift.ll:177
+; CHECK-NEXT:    vinsertf128 $1, %xmm1, %ymm0, %ymm0
+; CHECK-NEXT:    vcvttps2dq %ymm0, %ymm0
 ; CHECK-NEXT:    retq
----------------
craig.topper wrote:
> spatel wrote:
> > craig.topper wrote:
> > > Do we do the best thing if the shl is used by another operation that needs to be split? Do we keep the vcvttps2dq split?
> > Does the next test (vshift08_add) cover the scenario you're thinking of? There's no difference on that one because the concat isn't directly after the cast.
> I think it does.
> 
> Let me see if this works how I think it does. The shift will be legalized by LegalizeVectorOps first because we run that stage by legalizing operands before users. So the shift gets lowered first. When the shift gets lowered, it should split and produce a concat. Then each part of the split should get legalized. Then the add gets legalized which produces another split. getNode for the extracts for that split will look through the concat produced by the shift? Leaving that concat dead. Then a new concat will be produced for the add split?
I couldn't visualize it without looking at debug output, but that looks about right to me:
  Legalizing vector op: t7: v8i32 = shl t6, t2
  -->
  ...
  Creating new node: t18: v4i32 = shl t13, t15
  Creating new node: t19: v4i32 = shl t13, t17
  Creating new node: t20: v8i32 = concat_vectors t18, t19

  Legalizing vector op: t18: v4i32 = shl t13, t15
  --> 
  ...
  Creating new node: t28: v4i32 = fp_to_sint t27
  Creating new node: t29: v4i32 = mul t13, t28

  Legalizing vector op: t8: v8i32 = add t20, t4
  -->
  ...
  Creating new node: t40: v4i32 = add t29, t38
  Creating new node: t41: v4i32 = add t36, t39
  Creating new node: t42: v8i32 = concat_vectors t40, t41

So the add is already directly using the 128-bit "t29" mul node. And we only show the final concat here - "t20" is gone:

```
Vector-legalized selection DAG: %bb.0 'vshift08:'
SelectionDAG has 33 nodes:
  t0: ch = EntryToken
  t2: v8i32,ch = CopyFromReg t0, Register:v8i32 %0
  t4: v8i32,ch = CopyFromReg t0, Register:v8i32 %1
                  t15: v4i32 = extract_subvector t2, Constant:i64<0>
                t31: v4i32 = X86ISD::VSHLI t15, TargetConstant:i8<23>
              t26: v4i32 = add t31, t25
            t27: v4f32 = bitcast t26
          t28: v4i32 = fp_to_sint t27
        t29: v4i32 = mul t13, t28
        t38: v4i32 = extract_subvector t4, Constant:i64<0>
      t40: v4i32 = add t29, t38
                  t17: v4i32 = extract_subvector t2, Constant:i64<4>
                t37: v4i32 = X86ISD::VSHLI t17, TargetConstant:i8<23>
              t33: v4i32 = add t37, t25
            t34: v4f32 = bitcast t33
          t35: v4i32 = fp_to_sint t34
        t36: v4i32 = mul t13, t35
        t39: v4i32 = extract_subvector t4, Constant:i64<4>
      t41: v4i32 = add t36, t39
    t42: v8i32 = concat_vectors t40, t41
  t11: ch,glue = CopyToReg t0, Register:v8i32 $ymm0, t42
  t13: v4i32 = BUILD_VECTOR Constant:i32<1>, Constant:i32<1>, Constant:i32<1>, Constant:i32<1>
  t25: v4i32 = BUILD_VECTOR Constant:i32<1065353216>, Constant:i32<1065353216>, Constant:i32<1065353216>, Constant:i32<1065353216>
  t12: ch = X86ISD::RET_FLAG t11, TargetConstant:i32<0>, Register:v8i32 $ymm0, t11:1

```


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79360/new/

https://reviews.llvm.org/D79360





More information about the llvm-commits mailing list