[PATCH] D55263: [CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads.
Andrea Di Biagio via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Dec 6 03:35:12 PST 2018
andreadb added a comment.
In D55263#1321020 <https://reviews.llvm.org/D55263#1321020>, @courbet wrote:
> In D55263#1320582 <https://reviews.llvm.org/D55263#1320582>, @JohnReagan wrote:
>
> > One of my coworkers did an informal test last year and saw that newer Intel CPUs optimization of REP-string-op-instruction was faster than using SSE2 (he used large data sizes, not anything in the shorter ranges this patch deals with). Is that something that should be looked at? (or has somebody done that examination already)
>
>
> Yes, I'm planning to work on this next :) It should go in `SelectionDAGTargetInfo::EmitTargetCodeForMemcmp()`, similar to what we did for `memcpy` and `memset` though.
As long as we don't enable it for AMD then I am fine.
Instructions with a REP prefix incur in a significant setup overhead. So, they are definitely to avoid if the repeat count is small. Even on larger data sets (At least on AMD) a loop of vector operations would still provide a better throughput than REP MOVS/CMPQ.
Repository:
rL LLVM
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D55263/new/
https://reviews.llvm.org/D55263
More information about the llvm-commits
mailing list