[PATCH] D55263: [CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads.

Wed Dec 5 23:17:46 PST 2018

courbet marked an inline comment as done.
courbet added a comment.

In D55263#1320582 <https://reviews.llvm.org/D55263#1320582>, @JohnReagan wrote:

> One of my coworkers did an informal test last year and saw that newer Intel CPUs optimization of REP-string-op-instruction was faster than using SSE2 (he used large data sizes, not anything in the shorter ranges this patch deals with).  Is that something that should be looked at?  (or has somebody done that examination already)

Yes, I'm planning to work on this next :) It should go in `SelectionDAGTargetInfo::EmitTargetCodeForMemcmp()`, similar to what we did for `memcpy` and `memset` though.

================
Comment at: lib/Target/X86/X86TargetTransformInfo.cpp:2903
+    // All GPR loads can be unaligned, and vector loads too starting form SSE2.
+    Options.AllowOverlappingLoads = true;
     return Options;
----------------
JohnReagan wrote:
> Should this be guarded with hasSSE2()?  Does it makes sense for -no-sse compiles?
This should be on for all compiles, see e.g. the test case for N=7.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55263/new/

https://reviews.llvm.org/D55263