[PATCH] D51553: [DAGCombiner][x86] add transform/hook to load a scalar directly for use in a vector binop

Wed Sep 12 15:27:12 PDT 2018

spatel marked 2 inline comments as done.
spatel added inline comments.

================
Comment at: include/llvm/CodeGen/TargetLowering.h:535

+  virtual bool shouldLoadScalarIntoVectorOp(SDValue InsElt, SDValue Op) const {
+    return false;
----------------
lebedev.ri wrote:
> You probably want to add some explanatory comment here.
> 
Oops - yes, I forgot that.

================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:15106
+  //       other constant elements as needed.
+  return Op0IsLoad ? DAG.getNode(BO.getOpcode(), DL, VecVT, NewInsert, C)
+                   : DAG.getNode(BO.getOpcode(), DL, VecVT, C, NewInsert);
----------------
lebedev.ri wrote:
> Do we need to check that `BO.getOpcode()` is actually valid for `VecVT`?
> Or are we ok with scalarization?
Scalarization would almost certainly defeat the point of this transform, but I think it's better to have all of those checks in one place - in the target hook.

================
Comment at: test/CodeGen/X86/load-scalar-as-vector.ll:23
+; AVX2-NEXT:    vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; AVX2-NEXT:    vpbroadcastd {{.*#+}} xmm1 = [42,42,42,42]
+; AVX2-NEXT:    vpaddd %xmm1, %xmm0, %xmm0
----------------
lebedev.ri wrote:
> Why is there a `vpbroadcastd` here, as compared to `AVX1` version?
For AVX2 targets, we assume that a load+broadcast costs no more than a full vector load, but it saves data space, so that's the default. I suspect that we need to revisit that per-CPU and see if that's actually true.

https://reviews.llvm.org/D51553