[PATCH] D129852: [AArch64][SVE] Sink op into loop if it's used by PTEST and known to zero inactive lanes.

Wed Jul 20 10:19:07 PDT 2022

paulwalker-arm added a comment.

In D129852#3659230 <https://reviews.llvm.org/D129852#3659230>, @sdesmalen wrote:

> In D129852#3655395 <https://reviews.llvm.org/D129852#3655395>, @sdesmalen wrote:
>
>> In D129852#3655364 <https://reviews.llvm.org/D129852#3655364>, @paulwalker-arm wrote:
>>
>>> Can you provide a C/C++ example for when this is a problem? I'm not saying there's no problem to solve but rather that I'm starting to wonder if early decisions about removing reinterpret intrinsics are now starting to bite us.
>>
>> https://godbolt.org/z/q6hKMsxKK
>>
>> In this case it doesn't have anything to do with the reinterpret intrinsics, but rather the `ptrue` that gets hoisted out which is then no longer visible to (non-Global) ISel when it lowers the `ptest`.
>
> Just to expand on this a little bit, I experimented with disabling the removal of the reinterpret intrinsics, but found that the `convert.to.svbool` call is still outside the loop at the point of reaching codegen, meaning that it still can't fold away the ptest instruction unless it actively tries to sink the operand.

I can believe that because removing those optimisations just replace the implicit zeroing with explicit zeroing.   Whereas, I'm trying to figure out if we need to tighten up the intrinsics across the board to stop hiding the fact that predicate setting intrinsics always set all the bits.  There's nothing here for you to worry about because this patch is generally restoring previous behaviour.  I'm just worried we'll see an ever increase list of places were we need to restore the zeroing information that we've deliberately hidden.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:12765
+      if (auto *IIOp = dyn_cast<IntrinsicInst>(II->getOperand(0)))
+        if (isZeroingInactiveLanesIntrinsic(IIOp->getIntrinsicID()))
+          Ops.push_back(&II->getOperandUse(0));
----------------
My worry here is pulling something into a loop that's more expensive than what we're trying to remove.  Would it be wrong to limit this to simple constant splats and ptrue intrinsics?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D129852/new/

https://reviews.llvm.org/D129852