[PATCH] D55263: [CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads.
Andrea Di Biagio via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Dec 6 03:16:01 PST 2018
andreadb added a comment.
In D55263#1320172 <https://reviews.llvm.org/D55263#1320172>, @courbet wrote:
> Here's a basic benchmark for `memcmp(a, b, N)` where N is a compile-time constant, and a and b differ first at character M:
>
> F7651979: D55263.cc <https://reviews.llvm.org/F7651979>
>
> The change makes the impacted values **2.5 - 3x as fast**.
Nice patch Clement :-).
================
Comment at: lib/Target/X86/X86TargetTransformInfo.cpp:2902
Options.LoadSizes.push_back(1);
+ // All GPR loads can be unaligned, and vector loads too starting form SSE2.
+ Options.AllowOverlappingLoads = true;
----------------
s/form/from.
Strictly speaking, SSE1 provides MOVUPS for unaligned vector FP loads.
However, it gets problematic when comparing vectors for equality; using CMPEQPS is not going to work as expected for the case where one of the operands is NaN.
One of your tests shows that the expansion is effectively disabled if the target is SSE but not SSE2. However, as John wrote, I don't see where we check for feature SSE2 is done...
Repository:
rL LLVM
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D55263/new/
https://reviews.llvm.org/D55263
More information about the llvm-commits
mailing list