[PATCH] D51553: [DAGCombiner][x86] add transform/hook to load a scalar directly for use in a vector binop
Sanjay Patel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Sep 12 15:27:12 PDT 2018
spatel marked 2 inline comments as done.
spatel added inline comments.
================
Comment at: include/llvm/CodeGen/TargetLowering.h:535
+ virtual bool shouldLoadScalarIntoVectorOp(SDValue InsElt, SDValue Op) const {
+ return false;
----------------
lebedev.ri wrote:
> You probably want to add some explanatory comment here.
>
Oops - yes, I forgot that.
================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:15106
+ // other constant elements as needed.
+ return Op0IsLoad ? DAG.getNode(BO.getOpcode(), DL, VecVT, NewInsert, C)
+ : DAG.getNode(BO.getOpcode(), DL, VecVT, C, NewInsert);
----------------
lebedev.ri wrote:
> Do we need to check that `BO.getOpcode()` is actually valid for `VecVT`?
> Or are we ok with scalarization?
Scalarization would almost certainly defeat the point of this transform, but I think it's better to have all of those checks in one place - in the target hook.
================
Comment at: test/CodeGen/X86/load-scalar-as-vector.ll:23
+; AVX2-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; AVX2-NEXT: vpbroadcastd {{.*#+}} xmm1 = [42,42,42,42]
+; AVX2-NEXT: vpaddd %xmm1, %xmm0, %xmm0
----------------
lebedev.ri wrote:
> Why is there a `vpbroadcastd` here, as compared to `AVX1` version?
For AVX2 targets, we assume that a load+broadcast costs no more than a full vector load, but it saves data space, so that's the default. I suspect that we need to revisit that per-CPU and see if that's actually true.
https://reviews.llvm.org/D51553
More information about the llvm-commits
mailing list