[PATCH] D55263: [CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads.
Peter Cordes via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Dec 11 15:24:11 PST 2018
pcordes added a comment.
In D55263#1325283 <https://reviews.llvm.org/D55263#1325283>, @davezarzycki wrote:
> Let's not lose sight of the big picture here. If uarch problems exist, are they *worse* than the cost of calling memcmp()?
Almost certainly no, even for memcpy where potential store-forwarding stalls or 4k aliasing are a pretty minor concern most of the time.
I pointed those things out so the new unaligned load/store code-gen can be the best it can be while people are working on that code anyway, *not* because I think there's a risk of overall regressions.
> In other words, is the likely register spills, function call overhead, and dynamic algorithm selection (given the constantness of the size parameter is lost) worth it?
Right, libc memcmp / memcpy are not cheap for tiny sizes. A couple `cmp dword [rdi], imm32` / `jne` instructions should be better in almost every way, maybe even including code size at the call site depending on how many reloads we avoid.
Repository:
rL LLVM
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D55263/new/
https://reviews.llvm.org/D55263
More information about the llvm-commits
mailing list