[PATCH] D55263: [CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads.

Peter Cordes via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Dec 11 15:24:11 PST 2018


pcordes added a comment.

In D55263#1325283 <https://reviews.llvm.org/D55263#1325283>, @davezarzycki wrote:

> Let's not lose sight of the big picture here. If uarch problems exist, are they *worse* than the cost of calling memcmp()?


Almost certainly no, even for memcpy where potential store-forwarding stalls or 4k aliasing are a pretty minor concern most of the time.

I pointed those things out so the new unaligned load/store code-gen can be the best it can be while people are working on that code anyway, *not* because I think there's a risk of overall regressions.

> In other words, is the likely register spills, function call overhead, and dynamic algorithm selection (given the constantness of the size parameter is lost) worth it?

Right, libc memcmp / memcpy are not cheap for tiny sizes.  A couple `cmp dword [rdi], imm32` / `jne` instructions should be better in almost every way, maybe even including code size at the call site depending on how many reloads we avoid.


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55263/new/

https://reviews.llvm.org/D55263





More information about the llvm-commits mailing list