[PATCH] D143368: [InstCombine] Look through truncate to fold icmp with intrinsics

Thu Feb 9 12:47:59 PST 2023

spatel accepted this revision.
spatel added a comment.
This revision is now accepted and ready to land.

LGTM

================
Comment at: llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp:4954-4955
+      unsigned MaxRet = SrcBits;
+      if (match(II->getArgOperand(1), m_One()))
+        MaxRet--;
+
----------------
This needs an explanation comment. Something like:
  // If the "is_zero_poison" argument is set, then we know at least 
  // one bit is set in the input, so the result is always at least one
  // less than the full bitwidth of that input.

================
Comment at: llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp:4957
+
+      // Make sure the dest bits is enough to save the intrinsic output's range
+      if (llvm::Log2_32(MaxRet) + 1 <= Op0->getType()->getScalarSizeInBits())
----------------
Adjust word choice:
  // Make sure the destination is wide enough to hold the largest output of the intrinsic.

================
Comment at: llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp:4918

+  if (auto *II = dyn_cast<IntrinsicInst>(X)) {
+    if (II->getIntrinsicID() == Intrinsic::ctpop ||
----------------
bcl5980 wrote:
> goldstein.w.n wrote:
> > bcl5980 wrote:
> > > goldstein.w.n wrote:
> > > > Imo it makes more sense to do this generically.
> > > > 
> > > > `(icmp P (trunc(X), C))` if `KnownBits(X)[OrigWidth:TruncWidth] == 0` just drop the truncate. That will cover all these intrins + any other cases that happen to come up.
> > > I don't want to enable all cases for the trunc. That may cause a lot of potential regressions. By default we should shrink bits if possible not expand it. I prefer to limited the change to the case we can really get improvement.
> > > Maybe I can move this change to SDAG to avoid these concern.
> > Seems to only match `m_Oneuse(m_Trunc(...))` so seems like only case is the truncate is entirely eliminated, no? Not sure I see how that could cause regressions.
> What I mean is not IR level regression. Think about the case source type is i256, the dest type is i32. Remove the trunc and restore the next instruct to be i256 may cause extra instruction on backend. 
This is a gray area. Ideally, if we can eliminate IR instructions, we do it, but it if it results in wider instructions, then it's potentially not an improvement for analysis in IR. The codegen concern is secondary, but yes, we do factor that into the decision because sometimes it is not easy to invert the transform.

For this one, it seems unlikely that we'll gain much by using a potentially expensive (in compile-time) ValueTracking API because we probably already have specialized folds for the common patterns where we'd know the high bits are already cleared ('and' or 'lshr').

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143368/new/

https://reviews.llvm.org/D143368