[PATCH] D138791: [AArch64][SME]: Generate streaming-compatible code for ld2-alloca.
Sander de Smalen via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Nov 30 06:20:55 PST 2022
sdesmalen added inline comments.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13970
+ if (Subtarget->forceStreamingCompatibleSVE() ||
+ (Subtarget->useSVEForFixedLengthVectors() &&
+ (VecSize % Subtarget->getMinSVEVectorSizeInBits() == 0 ||
----------------
hassnaa-arm wrote:
> sdesmalen wrote:
> > Sorry, I just realise you'll also need to add a check that we can generate a predicate pattern for the number of elements (e.g. vl2, vl3, ...), because e.g. a <9 x i8> has no corresponding predicate pattern. You can use `Optional<unsigned> getSVEPredPatternFromNumElements()` for this (defined in Utils/AArch64BaseInfo.h).
> >
> > Can you also add a test for <9 x i8> ?
> Sorry, I don't understand why I should check for e.g. a <9 x i8> ? How is that related to the condition in the code ?
> and what do you mean by "You can use Optional<unsigned> getSVEPredPatternFromNumElements() for this " ?
> Do you mean that in addition to adding the check, I will add a code change also ?
What I meant was that if you do a shuffle mask like this:
%load = load <6 x i8>, ptr %alloc
%strided.vec = shufflevector <6 x i8> %load, <6 x i8> poison, <3 x i32> <i32 1, i32 3, i32 5>
You get the following LD2 instruction:
ptrue p0.b, vl3
ld2b { z0.b, z1.b }, p0/z, [x8]
Which uses `vl3` for the predicate, which means enable 3 lanes. But there is no `vl9`, so if you'd end up with NumElements == 9, then you can't code-generate the interleaved access using LD2. To ask if there is a ptrue predicate for `NumElements`, you can use `getSVEPredPatternFromNumElements`.
(I meant <9 x i8> as the result type of the shuffle by the way)
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D138791/new/
https://reviews.llvm.org/D138791
More information about the llvm-commits
mailing list