[PATCH] D141079: [SelectionDAG] Improve constant folding in the presence of SPLAT_VECTOR

Fri Jan 6 09:27:02 PST 2023

luke added inline comments.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:14018
+    // fold it that way
+    if (N0.getOpcode() == ISD::SPLAT_VECTOR &&
+        DAG.isConstantValueOfAnyType(N0.getOperand(0))) {
----------------
reames wrote:
> luke wrote:
> > reames wrote:
> > > This looks to be assuming fixed width splat_vectors.  The primary use of splat_vector are scalable vectors.  
> > That makes sense, I was wondering what the difference was between a splat_vector and a splatted build_vector. 
> > In this case then is it still possible to fold here?
> To my knowledge, we're a bit inconsistent about this.  RISCV uses SPLAT_VECTOR only for scalable vectors.  Hexagaon (and per your other comment, WebAssembly) use them for both fixed and scalable.  I'm also unclear on when they use SPLAT_VECTOR vs BUILD_VECTOR.  
> 
> Longer term, I do think that having one canonical representation for a splat vector makes sense, and that it'll probably be SPLAT_VECTOR.  We're just not there yet.  In particular, DAGCombine has various weaknesses for SPLAT_VECTOR that need to be worked through.  
This is a RISC-V test case I was able to throw together that shows the optimisation opportunity for scalable vectors:

```
define i32 @f(<vscale x 2 x i64> %a) {
  %v = insertelement <vscale x 2 x i64> %a, i64 0, i32 0
  %w = shufflevector <vscale x 2 x i64> %v, <vscale x 2 x i64> undef, <vscale x 2 x i32> zeroinitializer
  %x = bitcast <vscale x 2 x i64> %w to <vscale x 4 x i32>
  %y = extractelement <vscale x 4 x i32> %x, i32 0
  ret i32 %y
}
```

After the first DAG combine it looks like this:

```
Optimized lowered selection DAG: %bb.0 'f:'
SelectionDAG has 9 nodes:
    t0: ch,glue = EntryToken
        t7: nxv2i64 = splat_vector Constant:i64<0>
      t8: nxv4i32 = bitcast t7
    t9: i32 = extract_vector_elt t8, Constant:i32<0>
  t11: ch,glue = CopyToReg t0, Register:i32 $x10, t9
  t12: ch = RISCVISD::RET_FLAG t11, Register:i32 $x10, t11:1
```

If I'm not mistaken, it should be possible to constant fold the constant in `t7` into `t9`, but the lack of constant folding for `splat_vector`s in `bitcast`s prevents this.
I guess this is what I was trying to achieve with WebAssembly, except it was with fixed size vectors, so as you pointed out just making a splatted `build_vector` doesn't work.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:3033
   }
+  case ISD::SPLAT_VECTOR: {
+    SDValue Scl = Op.getOperand(0);
----------------
luke wrote:
> reames wrote:
> > You should be able to separate this into it's own patch with test coverage.
> > 
> > Note that this code is currently restricted to fixed length splat_vectors - which only hexagon currently uses.  You could chose to generalize the routine to scalable vectors if that was helpful.  
> WebAssembly now uses fixed length splat_vectors too to aid in selecting splatted loads (D139871).
> Will take a look at generalising this
Writing this down here before I forget:
I needed to provide this information in simplifyDemandedVecElts, because it was used by `SimplifyDemandedBits`, which is in turn used in `DAGCombiner::visitSTORE`

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141079/new/

https://reviews.llvm.org/D141079