[PATCH] D128364: [InstCombine] Look through more casts when folding memchr and memcmp

Martin Sebor via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Jun 27 13:54:16 PDT 2022


msebor added inline comments.


================
Comment at: llvm/lib/Analysis/ValueTracking.cpp:4196
+    Op0 = Op0->stripPointerCasts();
     const GlobalVariable *GV = dyn_cast<GlobalVariable>(Op0);
     if (!GV)
----------------
nikic wrote:
> msebor wrote:
> > nikic wrote:
> > > msebor wrote:
> > > > efriedma wrote:
> > > > > Maybe it makes sense to just drop the `dyn_cast<GlobalVariable>` here?
> > > > This is what I meant by the simplification opportunity (I think we discussed it with @nikic in another review but I should have made that clear here).  As far as I can tell the `GlobalVariable` cast is necessary to get at the `DataLayout` and to compute the offset below.  (Or is there some other way to get at it?)
> > > If you really want to avoid the DataLayout argument, you can do something like `dyn_cast<GlobalVariable>(getUnderlyingObject(V))`, take the DataLayout from there, and then use that to feed the `stripAndAccumulateConstantOffsets()` call.
> > > 
> > > Otherwise this is just shifting the problem to a `GEP + bitcast + GEP` sequence. (And it's worth noting that with opaque pointers the casts being skipped here don't exist anyway.)
> > Thanks for the pointer to `getUnderlyingObj`.  Unless there is a form of the function that also computes the aggregate GEP offset I don't think using it here would be appropriate since the rest of the function then has to (again) iterate over the same chain of casts and GEPs to compute the offset.  (Would enhancing `getUnderlyingObj`to compute the offset be appropriate?  FWIW, GCC has at least two utility functions that do both.)
> > 
> > I'm not opposed to adding the `DataLayout` argument (quite the contrary), but as I explained, I avoided it in this patch to keep it from growing too big.  If making this small incremental improvement first is a problem I can instead start by submitting a change to add the new argument.  Let me know your preference.
> > 
> > Opaque pointers obviously obviate the casts, so once typed pointers completely disappear the code can be simplified,  The removal of the `FisrstIdx` test will still be necessary to handle these use cases either way but that can be done then.  Would you prefer to defer this whole patch until then?
> stripAndAccumulateConstantOffsets() is the function that computes aggregate GEP offsets. getUnderlyingObject() is just a suggestion on how you can fetch the DataLayout without passing an argument. Basically the start of this function (down to line 4222) would become something like this:
> 
> ```
> auto *GV = dyn_cast<GlobalVariable>(getUnderlyingObject(V));
> if (!GV || !GV->isConstant() || !GV->hasDefinitiveInitializer())
>   return false; // Not based on constant global.
> 
> const DataLayout &DL = GV->getParent()->getDataLayout();
> APInt Offset(DL.getIndexTypeSizeInBits(V->getType()), 0);
> if (GV != P->stripAndAccumulateConstantOffsets(DL, Offset, /*AllowNonInbounds*/ true))
>   return false; // Could not determine offset from GV.
> ```
I'm pretty sure (I think) I understood what you meant.  My point is that a call to `getUnderlyingObject` has a linear complexity in the number of GEPs/casts, and so would a fully general recursive call to `getConstantDataArrayInfo`.  They both walk the IR to find the underlying declaration.  Calling the former from the latter would make the latter do double the work, with quadratic complexity in the number of GEPs.

The complexity could be kept linear by
- adding a `DataLayout` argument to `getConstantDataArrayInfo`
- having `getUnderlyingObject` also accumulate the offset from each GEP it strips on the way to the declaration of the object (it too would need a `DataLayout` argument).

Both changes seem useful to me independently of each other (and as I mentioned, a precedent for the `getUnderlyingObject` change is GCC's `get_ref_base_and_extent` utility function, among others).

But until at least the first is done, this patch handles a single cast + GEP + cast chain in //O(N)//.  Alternatively, I can add the `DataLayout` argument to `getConstantDataArrayInfo` first, and handle in //O(N)// arbitrarily long chains of cast/GEP sequences.

Having said all that, I have a feeling we might be talking past each other and this will not be the end of it.


================
Comment at: llvm/lib/Analysis/ValueTracking.cpp:4287
   if (Slice.Array == nullptr) {
-    if (TrimAtNul) {
+    if (Slice.Length && TrimAtNul) {
       Str = StringRef();
----------------
nikic wrote:
> msebor wrote:
> > efriedma wrote:
> > > msebor wrote:
> > > > efriedma wrote:
> > > > > Not sure I understand this change; why does it matter if the slice is zero-length here?
> > > > The change prevents treating past-the-end accesses to `char` arrays the same was as those to empty strings, such as in `strlen("123" + 4)`.  This is exercised by `str-int-3.ll` that I added in D125114 based on my understanding of the convention that library calls that provably result in an out-of-bounds access are best left for sanitizers to report.  (I think an argument can be made that folding such calls to zero is safer than counting on sanitizers, so I don't mind removing the check and adjusting the test if it's decided that's preferable.)
> > > If we want getConstantStringInfo(TrimAtNul=true) to always fail if it doesn't trim a nul, we could do that, but only failing when we don't trim a null and Slice.Array is null doesn't really make sense to me.
> > I've made the change you suggest in the updated patch.  It has a fairly pervasive impact on the couple of dozen or so clients of the function.  There are no tests for this case so I've developed and added a subset of of them in the updated patch.  It was a nontrivial effort whose goal is orthogonal to this enhancement and that I think would have been better handled separately and by someone who's less ambivalent about the benefits of doing this than me.  (The test coverage would ideally be extended to all supported functions but I leave that for some other time.)
> Possibly I'm misreading @efriedma's comment, but I think the suggestion here was to just drop the change you originally did from the patch. This doesn't seem to have any direct relation to the bitcast stripping part.
> 
> (For what it's worth, I think I've come around on the sanitizer issue from "we should accommodate sanitizers if it doesn't cost any effort" to "we should ignore sanitizers completely and optimize as aggressively as possible", because these little checks are not a principled solution anyway, but they always cause a lot of discussion about whether some particular case should be special-cased or not.)
Okay, given that, I'm comfortable removing both the check and the case `str-int-3.ll` test that otherwise fails.

I've also updated the new `strcall-no-nul.ll` test to reflect the current behavior on trunk, and in addition removed the other checks I added to verify the opposite strategy.  (Either way, being more consistent about this than the initial patch leads to churn that's strictly outside the scope of this enhancement.)

As an aside it would be helpful to capture somewhere the decision to fold even in spite of the effect on sanitizers so that we can point to it when the question comes up again in the future.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128364/new/

https://reviews.llvm.org/D128364



More information about the llvm-commits mailing list