[llvm] [RISCV] Enable TTI::shouldDropLSRSolutionIfLessProfitable by default (PR #89927)

Tue May 14 09:41:07 PDT 2024

================
@@ -86,30 +86,29 @@ declare i64 @llvm.vscale.i64()
 define void @sink_splat_add_scalable(ptr nocapture %a, i32 signext %x) {
 ; NO-SINK-LABEL: sink_splat_add_scalable:
 ; NO-SINK:       # %bb.0: # %entry
-; NO-SINK-NEXT:    csrr a5, vlenb
-; NO-SINK-NEXT:    srli a2, a5, 1
+; NO-SINK-NEXT:    csrr a2, vlenb
+; NO-SINK-NEXT:    srli a2, a2, 1
 ; NO-SINK-NEXT:    li a3, 1024
 ; NO-SINK-NEXT:    bgeu a3, a2, .LBB1_2
 ; NO-SINK-NEXT:  # %bb.1:
 ; NO-SINK-NEXT:    li a3, 0
 ; NO-SINK-NEXT:    j .LBB1_5
 ; NO-SINK-NEXT:  .LBB1_2: # %vector.ph
+; NO-SINK-NEXT:    li a5, 0
 ; NO-SINK-NEXT:    addi a3, a2, -1
 ; NO-SINK-NEXT:    andi a4, a3, 1024
 ; NO-SINK-NEXT:    xori a3, a4, 1024
 ; NO-SINK-NEXT:    vsetvli a6, zero, e32, m2, ta, ma
 ; NO-SINK-NEXT:    vmv.v.x v8, a1
-; NO-SINK-NEXT:    slli a5, a5, 1
-; NO-SINK-NEXT:    mv a6, a0
-; NO-SINK-NEXT:    mv a7, a3
 ; NO-SINK-NEXT:  .LBB1_3: # %vector.body
 ; NO-SINK-NEXT:    # =>This Inner Loop Header: Depth=1
+; NO-SINK-NEXT:    slli a6, a5, 2
+; NO-SINK-NEXT:    add a6, a0, a6
 ; NO-SINK-NEXT:    vl2re32.v v10, (a6)
 ; NO-SINK-NEXT:    vadd.vv v10, v10, v8
+; NO-SINK-NEXT:    add a5, a5, a2
----------------
preames wrote:

These diffs seem likely unprofitable, but more than that, they seems *weird*.  The solution being chosen should "clearly" be higher cost than the one being rejected here.  But more than that, the previous solution was also sub-optimal.  

At first, I thought this might relate to the shifted add and went looking for a case where we forgot to guard a shNadd instruction on Zba.  However, that turned out to be a red herring.

However, when doing that investigation, I noticed something.  We have *two* different implementations of isLegalAddressing mode.  One on TLI and one on TTI.  We have overridden the former, but don't appear to have done the later.

The default implementation on TTIImpl assumes reg+reg addressing is legal.  For RISCV vector, this is not a legal addressing mode.  I think what we're seeing here is that LSR is computing a bad cost for basically all addresses used by vector loads and stores, and that the actual effect of this change is basically noise.  

I suspect that if we implement TTI::isLegalAddressing, we'll see significant improvements in LSR solutions.  Once we do that, we can revisit this patch.  


https://github.com/llvm/llvm-project/pull/89927