[PATCH] D138424: [TargetLowering][AArch64] Teach DemandedBits about SVE count intrinsics

Wed Nov 23 01:36:26 PST 2022

dmgreen added inline comments.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:1140
+      return false;
+    int64_t VScaleResultUpperbound = *MaxVScale;
+    if (auto *MulImm = dyn_cast<ConstantSDNode>(Op.getOperand(0))) {
----------------
benmxwl-arm wrote:
> dmgreen wrote:
> > Could this just use construct known bits of both sides and use KnownBits::mul? It might be able to get value out of the low bits then too.
> Please correct me if I've made a silly mistake here, but it does not seem to be exactly KnownBits::mul:
> 
> ```
>     Optional<unsigned> MaxVScale = Attr.getVScaleRangeMax();
>     if (!MaxVScale.has_value())
>       return false;
>     if (auto *MulImm = dyn_cast<ConstantSDNode>(Op.getOperand(0))) {
>       unsigned RequiredBits = Log2_64(*MaxVScale) + 1;
>       if (RequiredBits >= BitWidth)
>         return false;
>       Known.Zero.setHighBits(BitWidth - RequiredBits);
>       Known = KnownBits::mul(Known, KnownBits::makeConstant(MulImm->getAPIntValue()));
>     }
>     return false;
> ```
> 
> If the MaxVScale is 16 (5 bits)
> 
> The known zero bits are everything above Log2(16 * Mul) + 1.
> The above snippet seems to end up with Log2((2^5 - 1) * Mul) + 1 (which is off by 1 bit)
> 
> Also it seems that KnowBits::mul can't handle negative multipliers and always reports no known bits in that case.
> 
> 
Ah - I was worried about that - because it works on bits and not ranges, the results are not as exact as they could otherwise be. I was also considering the vscale turning into a shift, but that may be SVE specific.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:23297

+  if (auto ElementSize = IsSVECntIntrinsic(Op)) {
+    // The SVE count intrinsics don't support the multiplier immediate so we
----------------
Can you move this into the switch under INTRINSIC_WO_CHAIN. Just so it doesn't need to be checked for every instruction.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:23303
+    // a value that's strictly less than "ALL".
+    unsigned MaxElements = AArch64::SVEMaxBitsPerVector / *ElementSize;
+    unsigned RequiredBits = Log2_64(MaxElements) + 1;
----------------
Is there a reason why this is this based on SVEMaxBitsPerVector and not the maximum value in VScaleRange? Or a combo of both if vscale_range is unbounded for some reason.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:23304
+    unsigned MaxElements = AArch64::SVEMaxBitsPerVector / *ElementSize;
+    unsigned RequiredBits = Log2_64(MaxElements) + 1;
+    unsigned BitWidth = Known.Zero.getBitWidth();
----------------
Nit: Log2_32?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138424/new/

https://reviews.llvm.org/D138424