[PATCH] D55263: [CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads.

Thu Dec 6 03:35:12 PST 2018

andreadb added a comment.

In D55263#1321020 <https://reviews.llvm.org/D55263#1321020>, @courbet wrote:

> In D55263#1320582 <https://reviews.llvm.org/D55263#1320582>, @JohnReagan wrote:
>
> > One of my coworkers did an informal test last year and saw that newer Intel CPUs optimization of REP-string-op-instruction was faster than using SSE2 (he used large data sizes, not anything in the shorter ranges this patch deals with).  Is that something that should be looked at?  (or has somebody done that examination already)
>
>
> Yes, I'm planning to work on this next :) It should go in `SelectionDAGTargetInfo::EmitTargetCodeForMemcmp()`, similar to what we did for `memcpy` and `memset` though.

As long as we don't enable it for AMD then I am fine.
Instructions with a REP prefix incur in a significant setup overhead. So, they are definitely to avoid if the repeat count is small. Even on larger data sets (At least on AMD) a loop of vector operations would still provide a better throughput than REP MOVS/CMPQ.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55263/new/

https://reviews.llvm.org/D55263