[PATCH] D107284: [LoopVectorize] Add support for replication of more intrinsics with scalable vectors

Tue Aug 3 04:03:11 PDT 2021

sdesmalen added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8910
+bool VPRecipeBuilder::isUniformReplicate(Instruction *I, bool IsScalable) const {
+  switch (I->getOpcode()) {
+  case Instruction::Call: {
----------------
Rather than having a switch within a switch, would it make sense to just bail out with `false` if I is not a CallInst? Or one step further, specialize the function to `isUniformIntrinsicCall`, which takes a `IntrinsicInst*` instead of `Instruction*`?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8913
+    Intrinsic::ID ID = getVectorIntrinsicIDForCall(cast<CallInst>(I), TLI);
+    switch (ID) {
+    case Intrinsic::sideeffect:
----------------
nit: The switch is missing a default case, which may result in build warnings.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8920
+    case Intrinsic::lifetime_end:
+      if (OrigLoop->hasLoopInvariantOperands(I))
+        return true;
----------------
It would be nice if there would be a better way to determine if the operands are uniform (e.g. if the pointer has a bitcast in the loop of a value that's otherwise loop-invariant, that could also be considered uniform), other than testing it like you do here. Maybe you could additionally check if the operands are either uniform or scalar, but I'm not sure if we care (enough), since in practically all cases the operands will be loop-invariant.

nit: how about writing this expression as:

  return IsScalable || OrigLoop->hasLoopInvariantOperands(I);

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8929-8931
+      //   1. For the assume intrinsic generating the instruction for the first
+      //      lane may still be better than not generating any at all. For
+      //      example, the input may be a splat across all lanes.
----------------
I guess the assume //could// be widened and the lanes could be `llvm.reduce.or`'ed together, but I doubt that LLVM will be able to use that knowledge in practice. I think dropping information about the other lanes is acceptable, and still an improvement of not vectorizing at all or having the compiler crash while trying to scalarize :)

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8934-8935
+      //      does anything useful when the input comes from an alloca, which
+      //      suggests it should always be uniform as we don't support
+      //      vectorizing allocas for scalable vectors. In the case when it is
+      //      not from an alloca, then the intrinsic would have no effect
----------------
> as we don't support vectorizing allocas for scalable vectors.
I don't think this is the argument for why we assume the value to be uniform. The `alloca` could be outside the loop, and it could only be the pointers that are not loop-invariant.

It's more that any non-uniform pointers probably won't be recognized by other passes, since ValueTracking's `findAllocaForValue` only looks through simple bitcasts/inttoptr/ptrtoint/phi instructions,  the intrinsic is ignored for anything more complicated when it can't determine the alloca.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107284/new/

https://reviews.llvm.org/D107284