[llvm-dev] [PATCH] strlen -> strnlen optimization

Michael McConville via llvm-commits llvm-commits at lists.llvm.org
Sat Feb 6 23:40:29 PST 2016


Mehdi Amini wrote:
> Also it seems that your patch was inline in the email and not
> attached? I couldn't apply it for some reason.

I see. I mostly work with projects that use inline patches (BSDs) so I
thought that was the preferred method. I'm surprised to hear that it
wouldn't apply because I'm using Mutt. Anyway, I'll attach in the
future.

A couple responses while I have time:

> > Index: lib/Transforms/Utils/SimplifyLibCalls.cpp
> > ===================================================================
> > --- lib/Transforms/Utils/SimplifyLibCalls.cpp	(revision 260011)
> > +++ lib/Transforms/Utils/SimplifyLibCalls.cpp	(working copy)
> > @@ -555,6 +555,49 @@
> >   if (isOnlyUsedInZeroEqualityComparison(CI))
> >     return B.CreateZExt(B.CreateLoad(Src, "strlenfirst"), CI->getType());
> > 
> > +  // strlen(x) < y --> strnlen(x, y+1) < y
> > +  //
> > +  // We ensure that there is only one user to avoid interfering with
> > +  // CSE.
> > +  if (!CI->hasOneUse() || !TLI->has(LibFunc::strnlen))
> > +    return nullptr;
> 
> You are adding this optimization inside a function that try multiple
> optimizations in sequence. An early return is not nice in this context
> because it prevent from adding other transformation in the same
> function afterwards. The cleanest way to keep early exits and avoid
> deep nesting is to extract this transformation in a static helper
> function.

Right - I meant to ask about this. I recognized the issue with early
returns, but the nesting quickly got out of hand. I'll break it out into
a helper function.


> > +  User *U = CI->user_back();
> > +  ICmpInst *IC;
> > +  if (!(IC = dyn_cast<ICmpInst>(U)))
> > +    return nullptr;
> 
>  ICmpInst *IC = dyn_cast<ICmpInst>(U);
> if (!IC)
>    return nullptr;

Good catch, will change.

> > +  IntegerType *SizeType = DL.getIntPtrType(B.GetInsertBlock()->getContext());
> > +  Value *LHS = IC->getOperand(0), *RHS = IC->getOperand(1);
> > +
> > +  ConstantInt *Con;
> > +  if (!((Con = dyn_cast<ConstantInt>(LHS)) || (Con = dyn_cast<ConstantInt>(RHS))))
> > +    return nullptr;
> 
> I think you can assume the constant to be on the right (and remove the
> swapOperands):
> 
> ConstantInt *Con = dyn_cast<ConstantInt>(IC->getOperand(1));
> if(!Con)
>   return nullptr;

Are you saying that LLVM will always position constants on the right? If
not, I think it'd be unfortunate to miss instances like 5 == strlen(s).
I know that some projects specify that ordering in their style guides to
prevent = vs. == bugs.

> > +  uint64_t con_val = Con->getZExtValue();
> > +
> > +  if (RHS == CI)
> > +    IC->swapOperands();
> > +
> > +  switch (IC->getPredicate()) {
> > +  case ICmpInst::ICMP_EQ:
> > +  case ICmpInst::ICMP_NE:
> > +  case ICmpInst::ICMP_UGT:
> > +  case ICmpInst::ICMP_UGE:
> > +  case ICmpInst::ICMP_ULE:
> > +  case ICmpInst::ICMP_SGT:
> > +  case ICmpInst::ICMP_SGE:
> > +  case ICmpInst::ICMP_SLE:
> > +    // XXX: check for wrapping
> > +    if (con_val == UINT64_MAX)
> > +      return nullptr;
> 
> ISTM that the wrapping check assumes that "sizeof(size_t) == 8", which
> is not always true.

I guess I was thinking only of overflow in the compiler itself.
Specifically, overflow of that uint64_t value. Is the range of size_t
vs. the size of str(n)len's argument not checked downstream?

If not, we could also just check against UINT32_MAX. Then, it seems that
the worst case would be a false positive when the constant is >= 2^31 -
1 and the target is 64-bit. In that case, it's probably ok to leave it
as a strlen call anyway. :-)

Let me know if I'm misunderstanding.

Thanks,
Michael


More information about the llvm-commits mailing list