[PATCH] D158291: [PoC][WIP] Add an AArch64 specific pass for loop idiom recognition

Thu Aug 31 07:49:06 PDT 2023

kmclaughlin marked an inline comment as done.
kmclaughlin added a subscriber: eli.friedman.
kmclaughlin added inline comments.

================
Comment at: llvm/include/llvm/IR/Intrinsics.td:2180

+def int_experimental_cttz_elts:
+  DefaultAttrsIntrinsic<[llvm_i32_ty],
----------------
craig.topper wrote:
> I wonder if something like "find first nonzero element" would be better?
This was something we considered, but we wanted to add an intrinsic which mirrors the behaviour of the existing cttz intrinsic. In D159283 I've added a second operand to indicate whether the result is poison if the first argument is all zero, similar to cttz.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:7467
+      SDValue AllZero = DAG.getConstant(0, DL, OpVT);
+      OpVT = OpVT.changeVectorElementType(MVT::i1);
+      Op = DAG.getSetCC(DL, OpVT, Op, AllZero, ISD::SETNE);
----------------
craig.topper wrote:
> changeVectorElementType doesn't work if the source type is an MVT and the resulting type is not an MVT. Probably better to use getVectorVT. There have been two recent bug fixes for something like this https://reviews.llvm.org/D157392 and 512a6c50e87c1956c028daf3317b07b3aa0e309f
Thank you @craig.topper, this has been addressed in D159283.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:7481
+    else
+      VSMax = CR.getUpper().getZExtValue();
+
----------------
efriedma wrote:
> Is the upper bound here guaranteed to fit into an i64?
Hi @eli.friedman, I believe it is safe to assume that the upper bound will fit into an i64. Calls to the vscale intrinsic are often 64 bits when generated by the vectoriser, and the getVScaleRangeMin/getVScaleRangeMax functions themselves are returning unsigned types.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:7487
+      unsigned MinBW = llvm::bit_width(
+          VSMax * OpVT.getVectorElementCount().getKnownMinValue());
+      EltWidth = llvm::bit_ceil(std::max(MinBW, (unsigned)8));
----------------
efriedma wrote:
> This multiply can overflow?
I've tried to address this on D159283 by calculating the smallest possible type using the `umul_sat` operation of ConstantRange.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:7504
+    SDValue And = DAG.getNode(ISD::AND, DL, NewVT, StepVL, Ext);
+    SDValue Max = DAG.getNode(ISD::VECREDUCE_SMAX, DL, NewEltTy, And);
+    SDValue Sub = DAG.getNode(ISD::SUB, DL, NewEltTy, VL, Max);
----------------
efriedma wrote:
> Is there some reason to use SMAX instead of UMAX?  It seems to complicate reasoning about the sign bit.
There was no reason for choosing SMAX, I have updated this to use UMAX instead.

================
Comment at: llvm/lib/Passes/PassBuilderPipelines.cpp:622

+  LPM2.addPass(IndVarSimplifyPass());
+
----------------
efriedma wrote:
> Did you really mean to remove LoopIdiomRecognize and replace it with a second run of IndVarSimplify?
> 
> I'm not sure why this patch requires messing with the default pass pipeline.
Removing the LoopIdiomRecognize is a mistake, I only intended to move the IndVarSimplify pass after `invokeLateLoopOptimizationsEPCallbacks` so that the new pass runs as close to LoopIdiomRecognize as possible.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158291/new/

https://reviews.llvm.org/D158291