[PATCH] D94230: [AArch64][SVE] Add SVE IR pass to coalesce ptrue instrinsic calls

Wed Jan 13 01:31:20 PST 2021

fhahn added a comment.

In D94230#2484413 <https://reviews.llvm.org/D94230#2484413>, @joechrisellis wrote:

> In D94230#2484200 <https://reviews.llvm.org/D94230#2484200>, @fhahn wrote:
>
>> It seems there is already a pass to optimize SVE intrinsics. Is there a reason to add a another separate pass, rather than extending the existing one?
>
> Hi @fhahn -- the reason I created a new pass here is because `SVEIntrinsicOpts.cpp` works on an instruction basis -- i.e. look at each SVE intrinsic call in isolation and see what can be done locally from there. On the other hand, the optimisation implemented by this pass instead requires you to look at the set of ptrue intrinsic calls in each basic block, which is not so easy to do within the existing structure of the `sve-intrinsic-opts` pass. I think it's probably cleaner to add a new pass to handle this, but I am open to suggestions. 😄

But `SVEIntrinsicOpts.cpp` could just apply the same optimization as this pass, after the per-instruction optimizations? The main thing I am worried about is not this pass in particular, but that we will end up with a lot of separate passes just optimizing a single SVE intrinsics.

Each pass has a cost, e.g each pass adds another option to enable/disable, needs to be scheduled, needs to iterate over all declarations to find the right intrinsic id, needs the whole pass setup boilerplate. Add to that the confusion that there is a pass that sounds general (`SVEIntrinsicOpt`), but doesn't optimize SVE intrinsic in general. While it may be cleaner/easier to implement optimizations for intrinsics in isolation, I am not sure that justifies the additional cost.

I have not looked at the specific intrinsic optimizations in detail, but on a high-level view it seems like they may benefit from working together, e.g. `SVECoalescePTrueIntrinsicsPass` might remove some instructions, which then enables further optimizations by `SVEIntrinsicOpt`, which in turn enables further optimizations by `SVECoalescePTrueIntrinsicsPass`. Separate passes make this much harder or impossible to handle.

================
Comment at: llvm/lib/Target/AArch64/SVECoalescePTrueIntrinsics.cpp:187
+  PTrues.remove(MostEncompassingPTrue);
+  PTrues.remove_if([&](auto *PTrue) { return isPTruePromoted(PTrue); });
+
----------------
do we need to capture everything here?

================
Comment at: llvm/lib/Target/AArch64/SVECoalescePTrueIntrinsics.cpp:264
+
+    for (auto I = F.user_begin(), E = F.user_end(); I != E;) {
+      auto *Inst = dyn_cast<Instruction>(*I++);
----------------
Is it possible to use something like `for (User *U : F.users())` or is the manual iterator management needed ?

================
Comment at: llvm/lib/Target/AArch64/SVECoalescePTrueIntrinsics.cpp:265
+    for (auto I = F.user_begin(), E = F.user_end(); I != E;) {
+      auto *Inst = dyn_cast<Instruction>(*I++);
+      Functions.insert(Inst->getFunction());
----------------
only use `dyn_cast` if you actually check if the result is `nullptr`. Otherwise, just use `cast`, which asserts that the result is `!= nullptr`.

Also, I think the user could also be a constant expression, so the cast would fail?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94230/new/

https://reviews.llvm.org/D94230