[llvm] [RISCV][TTI] Properly model odd vector sized LD/ST operations (PR #100436)
Philip Reames via llvm-commits
llvm-commits at lists.llvm.org
Thu Jul 25 08:29:43 PDT 2024
================
@@ -1390,14 +1390,34 @@ InstructionCost RISCVTTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src,
InstructionCost Cost = 0;
if (Opcode == Instruction::Store && OpInfo.isConstant())
Cost += getStoreImmCost(Src, OpInfo, CostKind);
- InstructionCost BaseCost =
- BaseT::getMemoryOpCost(Opcode, Src, Alignment, AddressSpace,
- CostKind, OpInfo, I);
+
+ std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Src);
+
+ InstructionCost BaseCost = [&]() {
+ InstructionCost Cost = LT.first;
+ if (CostKind != TTI::TCK_RecipThroughput)
+ return Cost;
+
+ // Our actual lowering uses a VL predicated load of the next legal type,
+ // or splitting if there is no wider legal type. This is reflected in
+ // the result of getTypeLegalizationCost, but BasicTTI assumes the
+ // widened cases are scalarized.
+ const DataLayout &DL = this->getDataLayout();
+ if (Src->isVectorTy() && LT.second.isVector()) {
+ auto *SrcVT = cast<VectorType>(Src);
+ TypeSize SrcEltSize = DL.getTypeStoreSizeInBits(SrcVT->getElementType());
+ TypeSize LegalEltSize = LT.second.getVectorElementType().getSizeInBits();
+ if (SrcEltSize == LegalEltSize)
----------------
preames wrote:
The primary motivation is <3 x i32> on V. The scalable vector thing on zve32x was merely a side finding. And no, that would be a single vle on the wider legal type. Here's a standalone test which shows that:
```
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=riscv64 -mattr=+zve32x < %s | FileCheck %s
define void @test(ptr %a, ptr %b) nounwind vscale_range(4,1024) {
; CHECK-LABEL: test:
; CHECK: # %bb.0:
; CHECK-NEXT: csrr a2, vlenb
; CHECK-NEXT: srli a2, a2, 3
; CHECK-NEXT: vsetvli zero, a2, e32, m1, ta, ma
; CHECK-NEXT: vle32.v v8, (a0)
; CHECK-NEXT: vse32.v v8, (a1)
; CHECK-NEXT: ret
%LD = call <vscale x 1 x i32> @llvm.masked.load.nxv1i32.p0(ptr %a, i32 4, <vscale x 1 x i1> splat(i1 true), <vscale x 1 x i32> poison)
store <vscale x 1 x i32> %LD, ptr %b
ret void
}
```
https://github.com/llvm/llvm-project/pull/100436
More information about the llvm-commits
mailing list