[PATCH] D111500: [InstSimplify] Simplify intrinsic comparisons with domain knoweldge

Mon Oct 11 14:22:37 PDT 2021

tra added inline comments.

================
Comment at: llvm/lib/Analysis/InstructionSimplify.cpp:609
+  // fold %cmp = icmp slt i32 %tid, %ntid to true.
+  if (Inst0->getIntrinsicID() == Intrinsic::nvvm_read_ptx_sreg_tid_x &&
+      Inst1->getIntrinsicID() == Intrinsic::nvvm_read_ptx_sreg_ntid_x)
----------------
tra wrote:
> jhuber6 wrote:
> > tra wrote:
> > > nikic wrote:
> > > > jhuber6 wrote:
> > > > > tra wrote:
> > > > > > What if LLVM has been compiled without NVPTX back-end? I'm not sure that NVVM intrinsics will be available then.
> > > > > > 
> > > > > > Perhaps we should re-visit enabling NVVMIntrRange.cpp pass, again. This should make it possible for LLVM to figure this optimization, and more.
> > > > > Ranges would only give us an upper bound right? Maybe we could insert `llvm.assume` calls as wall as the ranges there, then . 
> > > > > 
> > > > > 
> > > > > I think intrinsic functions are available, but I haven't checked. We use them in OpenMPOpt which is in the default pipeline and I haven't heard any complains so maybe it's probably fine.
> > > > I believe intrinsics are always included, even if the target is disabled. But I also don't think we have precedent for target intrinsic handling in InstSimplify, so adding @spatel and @lebedev.ri for that. Though I don't really see a problem with it.
> > > > 
> > > > We do provide InstCombine hooks (instCombineIntrinsic in TTI), but those work directly on the intrinsic. You could use that to replace NVVMIntrRange I believe. Though I don't think that would cover the particular use-case here, because range metadata is not sufficient to derive this result.
> > > > Ranges would only give us an upper bound right? 
> > > 
> > > Yes, they do not provide any info about relationship between launch grid parameters.
> > > 
> > > > Maybe we could insert llvm.assume calls as wall as the ranges there, then .
> > > 
> > > Something like that.
> > > 
> > I think using `llvm.assume` would be a good solution in general if we can get it to work, might make all of these cases automatic. Do we want to go down that avenue or just stick with this as the more straightforward option.
> > 
> > Is there a reason the NVVMIntrRange.cpp isn't currently enabled? Seems straightforward enough.
> > Is there a reason the NVVMIntrRange.cpp isn't currently enabled? Seems straightforward enough.
> 
> It triggered odd regressions in tensorflow code that I was unable to find the root cause for.
> With the pass providing only minor benefits, I've just got it disabled by default. I'll try to re-test with the pass enabled and see how it fares now.
> We do provide InstCombine hooks (instCombineIntrinsic in TTI), but those work directly on the intrinsic. You could use that to replace NVVMIntrRange

Interesting. We could indeed add ranges metadata there. I'm just not sure it's the best place for that. In order to be usefuf, we want ranges metadata to be available early. Adding it as a side-effect of InstCombine seems a bit odd -- both because it's not an optimization and because we'd run it multiple times even though we only need to add metadata only once per intrinsic. I guess, ideally it should be up to the intrinsic itself to provide the value range, but that's not something that exists right now. I think a one-shot pass that we can schedule independently is a decent fit for the job.

Also, I think may have figured out why `NVVMIntrRange` was causing the problems. I suspect that with the new pass manager the pass may have been initialized with the default constructor and that might give incorrect range info for the newer GPUs.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111500/new/

https://reviews.llvm.org/D111500