[PATCH] D147451: [CodeGen] Enable AArch64 SVE FCMLA/FCADD instruction generation in ComplexDeinterleaving

Tue Apr 11 09:30:35 PDT 2023

NickGuy added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:24473-24476
-  if ((VTyWidth < 128 && VTyWidth != 64) || !llvm::isPowerOf2_32(VTyWidth))
+  if ((VTyWidth < 128 && (VTy->isScalableTy() || VTyWidth != 64)) ||
+      !llvm::isPowerOf2_32(VTyWidth))
     return false;

   return (ScalarTy->isHalfTy() && Subtarget->hasFullFP16()) ||
----------------
igor.kirillov wrote:
> mgabka wrote:
> > NickGuy wrote:
> > > When working with scalable vectors, they don't have the same restriction of bit width. Treating them with a max width of 128 bits seems wasteful and inefficient, is there any way to get the vector width at compile time (is there a `target->getMaxVectorWidth()` or something)?
> > For the scalable vectors I don't think we want to use a min or max vector width, we should rather operate on the ElementCount and size of the ElementTypes I think.
> > IIUC for the scalable vectors the condition we want to check is if we are operating on the packed vector types (in that case all are supported) or on the set of unpacked vectors we are supporting, am I correct?
> > 
> > in that case maybe it is worth to have dedicated section for fixed width and scalable width vectors in this function?
> @NickGuy Actually, this functions returns false if VTyWidth is less than 128 bit, so any 128+ bit sized vectors are supported.
> 
> @mgabka We support any unpacked type with size 2**X if 2**X >= 128, there is code in `AArch64TargetLowering::createComplexDeinterleavingIR` that splits those vectors until they have minimal size of 128 and then merges them back. We don't support min-64bit sized vectors (unlike Neon)  and that's the condition I added to the `if` statement.
Not sure why I added the comment here, it was supposed to be on the `if (TyWidth > 128) {` below, oops...

How resource-efficient is this splitting with scalable vectors though, my concern is that we'd split the operation across numerous 256+ vectors while only using the lower 128 bits of each.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147451/new/

https://reviews.llvm.org/D147451