[PATCH] D128364: [InstCombine] Look through more casts when folding memchr and memcmp

Mon Jun 27 14:29:01 PDT 2022

nikic added inline comments.

================
Comment at: llvm/lib/Analysis/ValueTracking.cpp:4196
+    Op0 = Op0->stripPointerCasts();
     const GlobalVariable *GV = dyn_cast<GlobalVariable>(Op0);
     if (!GV)
----------------
msebor wrote:
> nikic wrote:
> > msebor wrote:
> > > nikic wrote:
> > > > msebor wrote:
> > > > > efriedma wrote:
> > > > > > Maybe it makes sense to just drop the `dyn_cast<GlobalVariable>` here?
> > > > > This is what I meant by the simplification opportunity (I think we discussed it with @nikic in another review but I should have made that clear here).  As far as I can tell the `GlobalVariable` cast is necessary to get at the `DataLayout` and to compute the offset below.  (Or is there some other way to get at it?)
> > > > If you really want to avoid the DataLayout argument, you can do something like `dyn_cast<GlobalVariable>(getUnderlyingObject(V))`, take the DataLayout from there, and then use that to feed the `stripAndAccumulateConstantOffsets()` call.
> > > > 
> > > > Otherwise this is just shifting the problem to a `GEP + bitcast + GEP` sequence. (And it's worth noting that with opaque pointers the casts being skipped here don't exist anyway.)
> > > Thanks for the pointer to `getUnderlyingObj`.  Unless there is a form of the function that also computes the aggregate GEP offset I don't think using it here would be appropriate since the rest of the function then has to (again) iterate over the same chain of casts and GEPs to compute the offset.  (Would enhancing `getUnderlyingObj`to compute the offset be appropriate?  FWIW, GCC has at least two utility functions that do both.)
> > > 
> > > I'm not opposed to adding the `DataLayout` argument (quite the contrary), but as I explained, I avoided it in this patch to keep it from growing too big.  If making this small incremental improvement first is a problem I can instead start by submitting a change to add the new argument.  Let me know your preference.
> > > 
> > > Opaque pointers obviously obviate the casts, so once typed pointers completely disappear the code can be simplified,  The removal of the `FisrstIdx` test will still be necessary to handle these use cases either way but that can be done then.  Would you prefer to defer this whole patch until then?
> > stripAndAccumulateConstantOffsets() is the function that computes aggregate GEP offsets. getUnderlyingObject() is just a suggestion on how you can fetch the DataLayout without passing an argument. Basically the start of this function (down to line 4222) would become something like this:
> > 
> > ```
> > auto *GV = dyn_cast<GlobalVariable>(getUnderlyingObject(V));
> > if (!GV || !GV->isConstant() || !GV->hasDefinitiveInitializer())
> >   return false; // Not based on constant global.
> > 
> > const DataLayout &DL = GV->getParent()->getDataLayout();
> > APInt Offset(DL.getIndexTypeSizeInBits(V->getType()), 0);
> > if (GV != P->stripAndAccumulateConstantOffsets(DL, Offset, /*AllowNonInbounds*/ true))
> >   return false; // Could not determine offset from GV.
> > ```
> I'm pretty sure (I think) I understood what you meant.  My point is that a call to `getUnderlyingObject` has a linear complexity in the number of GEPs/casts, and so would a fully general recursive call to `getConstantDataArrayInfo`.  They both walk the IR to find the underlying declaration.  Calling the former from the latter would make the latter do double the work, with quadratic complexity in the number of GEPs.
> 
> The complexity could be kept linear by
> - adding a `DataLayout` argument to `getConstantDataArrayInfo`
> - having `getUnderlyingObject` also accumulate the offset from each GEP it strips on the way to the declaration of the object (it too would need a `DataLayout` argument).
> 
> Both changes seem useful to me independently of each other (and as I mentioned, a precedent for the `getUnderlyingObject` change is GCC's `get_ref_base_and_extent` utility function, among others).
> 
> But until at least the first is done, this patch handles a single cast + GEP + cast chain in //O(N)//.  Alternatively, I can add the `DataLayout` argument to `getConstantDataArrayInfo` first, and handle in //O(N)// arbitrarily long chains of cast/GEP sequences.
> 
> Having said all that, I have a feeling we might be talking past each other and this will not be the end of it.
There shouldn't be any quadratic complexity -- when using getUnderlyingObject/stripAndAccumulateConstantOffsets there is no need for getConstantDataArrayInfo itself to be recursive, all the offsets are accumulated within a single call.

You are right that there is some redundant work going on in that getUnderlyingObject will first find the base object, and then stripAndAccumulateConstantOffsets will accumulate offsets down to it, but it's still a linear walk, repeated twice, so complexity is still O(n).

(If we want to really get down to it, doing two walks will actually be faster in practice, because getUnderlyingObject is significantly faster than stripAndAccumulateConstantOffsets, which means that we can quickly discard any cases that aren't based on a constant GV. This is just a side note, we generally wouldn't specifically optimize for that.)

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128364/new/

https://reviews.llvm.org/D128364