[llvm] [IA][RISCV] Support VP loads/stores in InterleavedAccessPass (PR #120490)
Luke Lau via llvm-commits
llvm-commits at lists.llvm.org
Tue Jan 28 01:03:22 PST 2025
================
@@ -22529,6 +22529,261 @@ bool RISCVTargetLowering::lowerInterleaveIntrinsicToStore(
return true;
}
+/// Lower an interleaved vp.load into a vlsegN intrinsic.
+///
+/// E.g. Lower an interleaved vp.load (Factor = 2):
+/// %l = call <vscale x 64 x i8> @llvm.vp.load.nxv64i8.p0(ptr %ptr,
+/// %mask,
+/// i32 %wide.rvl)
+/// %dl = tail call { <vscale x 32 x i8>, <vscale x 32 x i8> }
+/// @llvm.vector.deinterleave2.nxv64i8(
+/// <vscale x 64 x i8> %l)
+/// %r0 = extractvalue { <vscale x 32 x i8>, <vscale x 32 x i8> } %dl, 0
+/// %r1 = extractvalue { <vscale x 32 x i8>, <vscale x 32 x i8> } %dl, 1
+///
+/// Into:
+/// %rvl = udiv %wide.rvl, 2
+/// %sl = call { <vscale x 32 x i8>, <vscale x 32 x i8> }
+/// @llvm.riscv.vlseg2.mask.nxv32i8.i64(<vscale x 32 x i8> undef,
+/// <vscale x 32 x i8> undef,
+/// ptr %ptr,
+/// %mask,
+/// i64 %rvl,
+/// i64 1)
+/// %r0 = extractvalue { <vscale x 32 x i8>, <vscale x 32 x i8> } %sl, 0
+/// %r1 = extractvalue { <vscale x 32 x i8>, <vscale x 32 x i8> } %sl, 1
+///
+/// NOTE: the deinterleave2 intrinsic won't be touched and is expected to be
+/// removed by the caller
+bool RISCVTargetLowering::lowerDeinterleavedIntrinsicToVPLoad(
+ VPIntrinsic *Load, Value *Mask,
+ ArrayRef<Value *> DeInterleaveResults) const {
+ assert(Mask && "Expect a valid mask");
+ assert(Load->getIntrinsicID() == Intrinsic::vp_load &&
+ "Unexpected intrinsic");
+
+ const unsigned Factor = DeInterleaveResults.size();
+
+ auto *WideVTy = dyn_cast<ScalableVectorType>(Load->getType());
+ // TODO: Support fixed vectors.
+ if (!WideVTy)
+ return false;
+
+ unsigned WideNumElements = WideVTy->getElementCount().getKnownMinValue();
+ assert(WideNumElements % Factor == 0 &&
+ "ElementCount of a wide load must be divisible by interleave factor");
+ auto *VTy =
+ VectorType::get(WideVTy->getScalarType(), WideNumElements / Factor,
+ WideVTy->isScalableTy());
+ // FIXME: Should pass alignment attribute from pointer, but vectorizer needs
+ // to emit it first.
+ auto &DL = Load->getModule()->getDataLayout();
+ Align Alignment = Align(DL.getTypeStoreSize(WideVTy->getScalarType()));
+ if (!isLegalInterleavedAccessType(
+ VTy, Factor, Alignment,
+ Load->getArgOperand(0)->getType()->getPointerAddressSpace(), DL))
+ return false;
+
+ IRBuilder<> Builder(Load);
+ Value *WideEVL = Load->getArgOperand(2);
+ auto *XLenTy = Type::getIntNTy(Load->getContext(), Subtarget.getXLen());
+ Value *EVL = Builder.CreateZExtOrTrunc(
+ Builder.CreateUDiv(WideEVL, ConstantInt::get(WideEVL->getType(), Factor)),
+ XLenTy);
----------------
lukel97 wrote:
Is this transform correct if we end up truncating the original EVL?
E.g. if the original wide vp load had 7 elements, this will transform it into only loading 6 elements, making the last element undefined.
I have a feeling that whenever the loop vectorizer adds EVL support for interleaved accesses the EVL passed to the wide vp.load will be `@llvm.experimental.get.vector.length.i64` multiplied by the factor or something.
Can we get away with checking that the EVL is a multiple of the factor with `computeKnownBits`? So we could only handle test cases for now where the wide vp.load's EVL is a mul of the factor. Or is there extra alignment information about the segment that would allow us to increase the EVL?
https://github.com/llvm/llvm-project/pull/120490
More information about the llvm-commits
mailing list