[PATCH] D55263: [CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads.

Andrea Di Biagio via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Dec 6 03:16:01 PST 2018


andreadb added a comment.

In D55263#1320172 <https://reviews.llvm.org/D55263#1320172>, @courbet wrote:

> Here's a basic benchmark for `memcmp(a, b, N)` where N is a compile-time constant, and a and b differ first at character M:
>
> F7651979: D55263.cc <https://reviews.llvm.org/F7651979>
>
> The change makes the impacted values **2.5 - 3x as fast**.


Nice patch Clement :-).



================
Comment at: lib/Target/X86/X86TargetTransformInfo.cpp:2902
     Options.LoadSizes.push_back(1);
+    // All GPR loads can be unaligned, and vector loads too starting form SSE2.
+    Options.AllowOverlappingLoads = true;
----------------
s/form/from.

Strictly speaking, SSE1 provides MOVUPS for unaligned vector FP loads.
However, it gets problematic when comparing vectors for equality; using CMPEQPS is not going to work as expected for the case where one of the operands is NaN.
One of your tests shows that the expansion is effectively disabled if the target is SSE but not SSE2. However, as John wrote, I don't see where we check for feature SSE2 is done... 


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55263/new/

https://reviews.llvm.org/D55263





More information about the llvm-commits mailing list