[llvm-dev] [PATCH] strlen -> strnlen optimization
Michael McConville via llvm-commits
llvm-commits at lists.llvm.org
Sat Feb 6 23:40:29 PST 2016
Mehdi Amini wrote:
> Also it seems that your patch was inline in the email and not
> attached? I couldn't apply it for some reason.
I see. I mostly work with projects that use inline patches (BSDs) so I
thought that was the preferred method. I'm surprised to hear that it
wouldn't apply because I'm using Mutt. Anyway, I'll attach in the
future.
A couple responses while I have time:
> > Index: lib/Transforms/Utils/SimplifyLibCalls.cpp
> > ===================================================================
> > --- lib/Transforms/Utils/SimplifyLibCalls.cpp (revision 260011)
> > +++ lib/Transforms/Utils/SimplifyLibCalls.cpp (working copy)
> > @@ -555,6 +555,49 @@
> > if (isOnlyUsedInZeroEqualityComparison(CI))
> > return B.CreateZExt(B.CreateLoad(Src, "strlenfirst"), CI->getType());
> >
> > + // strlen(x) < y --> strnlen(x, y+1) < y
> > + //
> > + // We ensure that there is only one user to avoid interfering with
> > + // CSE.
> > + if (!CI->hasOneUse() || !TLI->has(LibFunc::strnlen))
> > + return nullptr;
>
> You are adding this optimization inside a function that try multiple
> optimizations in sequence. An early return is not nice in this context
> because it prevent from adding other transformation in the same
> function afterwards. The cleanest way to keep early exits and avoid
> deep nesting is to extract this transformation in a static helper
> function.
Right - I meant to ask about this. I recognized the issue with early
returns, but the nesting quickly got out of hand. I'll break it out into
a helper function.
> > + User *U = CI->user_back();
> > + ICmpInst *IC;
> > + if (!(IC = dyn_cast<ICmpInst>(U)))
> > + return nullptr;
>
> ICmpInst *IC = dyn_cast<ICmpInst>(U);
> if (!IC)
> return nullptr;
Good catch, will change.
> > + IntegerType *SizeType = DL.getIntPtrType(B.GetInsertBlock()->getContext());
> > + Value *LHS = IC->getOperand(0), *RHS = IC->getOperand(1);
> > +
> > + ConstantInt *Con;
> > + if (!((Con = dyn_cast<ConstantInt>(LHS)) || (Con = dyn_cast<ConstantInt>(RHS))))
> > + return nullptr;
>
> I think you can assume the constant to be on the right (and remove the
> swapOperands):
>
> ConstantInt *Con = dyn_cast<ConstantInt>(IC->getOperand(1));
> if(!Con)
> return nullptr;
Are you saying that LLVM will always position constants on the right? If
not, I think it'd be unfortunate to miss instances like 5 == strlen(s).
I know that some projects specify that ordering in their style guides to
prevent = vs. == bugs.
> > + uint64_t con_val = Con->getZExtValue();
> > +
> > + if (RHS == CI)
> > + IC->swapOperands();
> > +
> > + switch (IC->getPredicate()) {
> > + case ICmpInst::ICMP_EQ:
> > + case ICmpInst::ICMP_NE:
> > + case ICmpInst::ICMP_UGT:
> > + case ICmpInst::ICMP_UGE:
> > + case ICmpInst::ICMP_ULE:
> > + case ICmpInst::ICMP_SGT:
> > + case ICmpInst::ICMP_SGE:
> > + case ICmpInst::ICMP_SLE:
> > + // XXX: check for wrapping
> > + if (con_val == UINT64_MAX)
> > + return nullptr;
>
> ISTM that the wrapping check assumes that "sizeof(size_t) == 8", which
> is not always true.
I guess I was thinking only of overflow in the compiler itself.
Specifically, overflow of that uint64_t value. Is the range of size_t
vs. the size of str(n)len's argument not checked downstream?
If not, we could also just check against UINT32_MAX. Then, it seems that
the worst case would be a false positive when the constant is >= 2^31 -
1 and the target is 64-bit. In that case, it's probably ok to leave it
as a strlen call anyway. :-)
Let me know if I'm misunderstanding.
Thanks,
Michael
More information about the llvm-commits
mailing list