[PATCH] D141079: [SelectionDAG] Improve constant folding in the presence of SPLAT_VECTOR

Fri Jan 6 12:34:27 PST 2023

luke added inline comments.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:14018
+    // fold it that way
+    if (N0.getOpcode() == ISD::SPLAT_VECTOR &&
+        DAG.isConstantValueOfAnyType(N0.getOperand(0))) {
----------------
luke wrote:
> reames wrote:
> > luke wrote:
> > > reames wrote:
> > > > This looks to be assuming fixed width splat_vectors.  The primary use of splat_vector are scalable vectors.  
> > > That makes sense, I was wondering what the difference was between a splat_vector and a splatted build_vector. 
> > > In this case then is it still possible to fold here?
> > To my knowledge, we're a bit inconsistent about this.  RISCV uses SPLAT_VECTOR only for scalable vectors.  Hexagaon (and per your other comment, WebAssembly) use them for both fixed and scalable.  I'm also unclear on when they use SPLAT_VECTOR vs BUILD_VECTOR.  
> > 
> > Longer term, I do think that having one canonical representation for a splat vector makes sense, and that it'll probably be SPLAT_VECTOR.  We're just not there yet.  In particular, DAGCombine has various weaknesses for SPLAT_VECTOR that need to be worked through.  
> This is a RISC-V test case I was able to throw together that shows the optimisation opportunity for scalable vectors:
> 
> ```
> define i32 @f(<vscale x 2 x i64> %a) {
>   %v = insertelement <vscale x 2 x i64> %a, i64 0, i32 0
>   %w = shufflevector <vscale x 2 x i64> %v, <vscale x 2 x i64> undef, <vscale x 2 x i32> zeroinitializer
>   %x = bitcast <vscale x 2 x i64> %w to <vscale x 4 x i32>
>   %y = extractelement <vscale x 4 x i32> %x, i32 0
>   ret i32 %y
> }
> ```
> 
> After the first DAG combine it looks like this:
> 
> ```
> Optimized lowered selection DAG: %bb.0 'f:'
> SelectionDAG has 9 nodes:
>     t0: ch,glue = EntryToken
>         t7: nxv2i64 = splat_vector Constant:i64<0>
>       t8: nxv4i32 = bitcast t7
>     t9: i32 = extract_vector_elt t8, Constant:i32<0>
>   t11: ch,glue = CopyToReg t0, Register:i32 $x10, t9
>   t12: ch = RISCVISD::RET_FLAG t11, Register:i32 $x10, t11:1
> ```
> 
> If I'm not mistaken, it should be possible to constant fold the constant in `t7` into `t9`, but the lack of constant folding for `splat_vector`s in `bitcast`s prevents this.
> I guess this is what I was trying to achieve with WebAssembly, except it was with fixed size vectors, so as you pointed out just making a splatted `build_vector` doesn't work.
After some digging, I've found it's not always possible to constant fold `(bitcast (splat_vector x)) -> (splat_vector y)`, at least not into another `splat_vector`. It's only possible whenever the bitcast type has a larger scalar element type than the original element type.
For now, I'm trying to see if combining `(extract_vector_elt (bitcast (splat_vector x)) n) -> y` yields similar results. 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141079/new/

https://reviews.llvm.org/D141079