[llvm] [RISCV][TTI] Properly model odd vector sized LD/ST operations (PR #100436)

Thu Jul 25 09:12:29 PDT 2024

================
@@ -1390,14 +1390,34 @@ InstructionCost RISCVTTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src,
   InstructionCost Cost = 0;
   if (Opcode == Instruction::Store && OpInfo.isConstant())
     Cost += getStoreImmCost(Src, OpInfo, CostKind);
-  InstructionCost BaseCost =
-    BaseT::getMemoryOpCost(Opcode, Src, Alignment, AddressSpace,
-                           CostKind, OpInfo, I);
+
+  std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Src);
+
+  InstructionCost BaseCost = [&]() {
+    InstructionCost Cost = LT.first;
+    if (CostKind != TTI::TCK_RecipThroughput)
+      return Cost;
+
+    // Our actual lowering uses a VL predicated load of the next legal type,
+    // or splitting if there is no wider legal type.  This is reflected in
+    // the result of getTypeLegalizationCost, but BasicTTI assumes the
+    // widened cases are scalarized.
+    const DataLayout &DL = this->getDataLayout();
+    if (Src->isVectorTy() && LT.second.isVector()) {
+      auto *SrcVT = cast<VectorType>(Src);
+      TypeSize SrcEltSize = DL.getTypeStoreSizeInBits(SrcVT->getElementType());
+      TypeSize LegalEltSize = LT.second.getVectorElementType().getSizeInBits();
+      if (SrcEltSize == LegalEltSize)
----------------
preames wrote:

When writing an updated description for this review which was inclusive of the splitting behavior in the cost model, I sat down to write a couple of lowering tests to be sure the costing I was describing was both correct and matched actual lowering.

When doing so, I found a test case which crashes during lowering (see below).  Given that, I added a restriction to this change to specifically only change the cost when promoting.  This still covers the cases I'm interesting in, and avoids the possibility of perturbing the costs in an area where we appear to have a lowering problem.

```
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=riscv64 -mattr=+zve32x < %s | FileCheck %s

; Note that vscale x 32 x i32 appears to work just fine, and we split to two LMUL 8
; types as expected.  It's apparently splitting combined with the need to lower the
; second half with VL predication which is causing problems.
define void @test31(ptr %a, ptr %b) nounwind vscale_range(4,1024) {
  %LD = call <vscale x 31 x i32> @llvm.masked.load.nxv31i32.p0(ptr %a, i32 4, <vscale x 31 x i1> splat(i1 true), <vscale x 31 x i32> poison)
  store <vscale x 31 x i32> %LD, ptr %b
  ret void
}




```

https://github.com/llvm/llvm-project/pull/100436