[PATCH] D147451: [CodeGen] Enable AArch64 SVE FCMLA/FCADD instruction generation in ComplexDeinterleaving
Igor Kirillov via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Apr 11 10:05:08 PDT 2023
igor.kirillov added inline comments.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:24473-24476
- if ((VTyWidth < 128 && VTyWidth != 64) || !llvm::isPowerOf2_32(VTyWidth))
+ if ((VTyWidth < 128 && (VTy->isScalableTy() || VTyWidth != 64)) ||
+ !llvm::isPowerOf2_32(VTyWidth))
return false;
return (ScalarTy->isHalfTy() && Subtarget->hasFullFP16()) ||
----------------
NickGuy wrote:
> igor.kirillov wrote:
> > mgabka wrote:
> > > NickGuy wrote:
> > > > When working with scalable vectors, they don't have the same restriction of bit width. Treating them with a max width of 128 bits seems wasteful and inefficient, is there any way to get the vector width at compile time (is there a `target->getMaxVectorWidth()` or something)?
> > > For the scalable vectors I don't think we want to use a min or max vector width, we should rather operate on the ElementCount and size of the ElementTypes I think.
> > > IIUC for the scalable vectors the condition we want to check is if we are operating on the packed vector types (in that case all are supported) or on the set of unpacked vectors we are supporting, am I correct?
> > >
> > > in that case maybe it is worth to have dedicated section for fixed width and scalable width vectors in this function?
> > @NickGuy Actually, this functions returns false if VTyWidth is less than 128 bit, so any 128+ bit sized vectors are supported.
> >
> > @mgabka We support any unpacked type with size 2**X if 2**X >= 128, there is code in `AArch64TargetLowering::createComplexDeinterleavingIR` that splits those vectors until they have minimal size of 128 and then merges them back. We don't support min-64bit sized vectors (unlike Neon) and that's the condition I added to the `if` statement.
> Not sure why I added the comment here, it was supposed to be on the `if (TyWidth > 128) {` below, oops...
>
> How resource-efficient is this splitting with scalable vectors though, my concern is that we'd split the operation across numerous 256+ vectors while only using the lower 128 bits of each.
Not sure if I understood your concern correctly. We are splitting any 256+ min-sized vector instructions to those that have minimum 128 bits. How many bits are going to be there depends on actual CPU, but generated code would work just fine even without knowing that information.
For example, for <vscale x 8 x double> we'll get 4 instructions working on vectors of size <vscale x 2 x double>
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D147451/new/
https://reviews.llvm.org/D147451
More information about the llvm-commits
mailing list