[PATCH] D35750: [x86] Teach the x86 backend about general fast rep+movs and rep+stos features of modern x86 CPUs, and use this feature to drastically reduce the number of places we actually emit memset and memcpy library calls.

Eric Christopher via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Jul 21 18:00:52 PDT 2017


echristo added inline comments.


================
Comment at: lib/Target/X86/X86.td:287
+// lowerings for string operations when calling the library function would have
+// too high of cost.
+def FeatureFastRepStrOps
----------------
"too high of a cost"


================
Comment at: lib/Target/X86/X86SelectionDAGInfo.cpp:73
+
+static std::pair<SDValue, MVT> getUnscaledSizeAndVT(SelectionDAG &DAG,
+                                                    const SDLoc &DL,
----------------
Comment please on what this is for :)


================
Comment at: lib/Target/X86/X86SelectionDAGInfo.cpp:115
+
+static std::pair<SDValue, unsigned> getWidenedValueAndReg(SelectionDAG &DAG,
+                                                          const SDLoc &DL,
----------------
Ditto.


================
Comment at: lib/Target/X86/X86SelectionDAGInfo.cpp:164-171
   if ((Align & 3) != 0 || !ConstantSize ||
       ConstantSize->getZExtValue() > Subtarget.getMaxInlineSizeThreshold()) {
-    // Check to see if there is a specialized entry-point for memory zeroing.
+    // When we have a fast REP+STOS CPU and either have ERMSB + 16-byte
+    // alignment or PIC overhead for a library call, bypass the library call
+    // entirely.
+    if (Subtarget.hasFastRepStrOps() &&
+        (Subtarget.isPositionIndependent() ||
----------------
This seems like this all wants to be subsumed in a couple of different checks?

i.e. I don't think with the rep str ops that the max inline size threshold is as important anymore. I worry that we don't actually care about whether or not pic here and should probably just use the inline expansion either way.


================
Comment at: lib/Target/X86/X86SelectionDAGInfo.cpp:319-327
+  if (!ConstantSize ||
+      (!AlwaysInline &&
+       ConstantSize->getZExtValue() > Subtarget.getMaxInlineSizeThreshold())) {
+    // When we have a fast REP+STOS CPU and either have ERMSB + 16-byte
+    // alignment or PIC overhead for a library call, bypass the library call
+    // entirely.
+    if (Subtarget.hasFastRepStrOps() &&
----------------
Ditto.


https://reviews.llvm.org/D35750





More information about the llvm-commits mailing list