[PATCH] D138791: [AArch64][SME]: Generate streaming-compatible code for ld2-alloca.

Sander de Smalen via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Nov 30 06:50:28 PST 2022


sdesmalen added inline comments.


================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13970
+  if (Subtarget->forceStreamingCompatibleSVE() ||
+      (Subtarget->useSVEForFixedLengthVectors() &&
+       (VecSize % Subtarget->getMinSVEVectorSizeInBits() == 0 ||
----------------
hassnaa-arm wrote:
> sdesmalen wrote:
> > hassnaa-arm wrote:
> > > sdesmalen wrote:
> > > > hassnaa-arm wrote:
> > > > > sdesmalen wrote:
> > > > > > Sorry, I just realise you'll also need to add a check that we can generate a predicate pattern for the number of elements (e.g. vl2, vl3, ...), because e.g. a <9 x i8> has no corresponding predicate pattern. You can use `Optional<unsigned> getSVEPredPatternFromNumElements()` for this (defined in Utils/AArch64BaseInfo.h).
> > > > > > 
> > > > > > Can you also add a test for <9 x i8> ?
> > > > > Sorry, I don't understand why I should check for e.g. a <9 x i8> ? How is that related to the condition in the code ?
> > > > > and what do you mean by "You can use Optional<unsigned> getSVEPredPatternFromNumElements() for this "  ? 
> > > > > Do you mean that in addition to adding the check, I will add a code change also ?
> > > > What I meant was that if you do a shuffle mask like this:
> > > > 
> > > >   %load = load <6 x i8>, ptr %alloc
> > > >   %strided.vec = shufflevector <6 x i8> %load, <6 x i8> poison, <3 x i32> <i32 1, i32 3, i32 5>
> > > > 
> > > > You get the following LD2 instruction:
> > > > 
> > > >   ptrue p0.b, vl3
> > > >   ld2b { z0.b, z1.b }, p0/z, [x8]
> > > > 
> > > > Which uses `vl3` for the predicate, which means enable 3 lanes. But there is no `vl9`, so if you'd end up with NumElements == 9, then you can't code-generate the interleaved access using LD2. To ask if there is a ptrue predicate for `NumElements`, you can use `getSVEPredPatternFromNumElements`.
> > > > 
> > > > (I meant <9 x i8> as the result type of the shuffle by the way)
> > > you mean if the result type of the shuffle is <9 x i8> there will be a problem, and you are asking me to add a test cases for that and fix its problem, correct ?
> > Correct.
> but why did you choose vl9 specifically ? what about other vl ? e.g. vl10.
Because the available predicate patterns are vl1, vl2, vl3, ..., vl8, vl16, ... So vl9 is the first non-power-of-2 vector length that can't be represented.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138791/new/

https://reviews.llvm.org/D138791



More information about the llvm-commits mailing list