[llvm] [RISCV][TTI] Properly model odd vector sized LD/ST operations (PR #100436)

Thu Jul 25 08:29:43 PDT 2024

================
@@ -1390,14 +1390,34 @@ InstructionCost RISCVTTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src,
   InstructionCost Cost = 0;
   if (Opcode == Instruction::Store && OpInfo.isConstant())
     Cost += getStoreImmCost(Src, OpInfo, CostKind);
-  InstructionCost BaseCost =
-    BaseT::getMemoryOpCost(Opcode, Src, Alignment, AddressSpace,
-                           CostKind, OpInfo, I);
+
+  std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Src);
+
+  InstructionCost BaseCost = [&]() {
+    InstructionCost Cost = LT.first;
+    if (CostKind != TTI::TCK_RecipThroughput)
+      return Cost;
+
+    // Our actual lowering uses a VL predicated load of the next legal type,
+    // or splitting if there is no wider legal type.  This is reflected in
+    // the result of getTypeLegalizationCost, but BasicTTI assumes the
+    // widened cases are scalarized.
+    const DataLayout &DL = this->getDataLayout();
+    if (Src->isVectorTy() && LT.second.isVector()) {
+      auto *SrcVT = cast<VectorType>(Src);
+      TypeSize SrcEltSize = DL.getTypeStoreSizeInBits(SrcVT->getElementType());
+      TypeSize LegalEltSize = LT.second.getVectorElementType().getSizeInBits();
+      if (SrcEltSize == LegalEltSize)
----------------
preames wrote:

The primary motivation is <3 x i32> on V.  The scalable vector thing on zve32x was merely a side finding.  And no, that would be a single vle on the wider legal type.  Here's a standalone test which shows that:

```
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=riscv64 -mattr=+zve32x < %s | FileCheck %s

define void @test(ptr %a, ptr %b) nounwind vscale_range(4,1024) {
; CHECK-LABEL: test:
; CHECK:       # %bb.0:
; CHECK-NEXT:    csrr a2, vlenb
; CHECK-NEXT:    srli a2, a2, 3
; CHECK-NEXT:    vsetvli zero, a2, e32, m1, ta, ma
; CHECK-NEXT:    vle32.v v8, (a0)
; CHECK-NEXT:    vse32.v v8, (a1)
; CHECK-NEXT:    ret
  %LD = call <vscale x 1 x i32> @llvm.masked.load.nxv1i32.p0(ptr %a, i32 4, <vscale x 1 x i1> splat(i1 true), <vscale x 1 x i32> poison)
  store <vscale x 1 x i32> %LD, ptr %b
  ret void
}

```

https://github.com/llvm/llvm-project/pull/100436