[PATCH] D35750: [x86] Teach the x86 backend about general fast rep+movs and rep+stos features of modern x86 CPUs, and use this feature to drastically reduce the number of places we actually emit memset and memcpy library calls.

Craig Topper via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sat Jul 22 15:18:22 PDT 2017


craig.topper added a comment.

I believe prior to Ivy Bridge the Intel optimization manual indicates that rep movsb/stosb was only optimized to handle 1-3 bytes. Specifically to be used to handle the remainder portion in conjunction with rep+movsd to handle the rest.



================
Comment at: lib/Target/X86/X86.td:287
+// lowerings for string operations when calling the library function would have
+// too high of cost.
+def FeatureFastRepStrOps
----------------
echristo wrote:
> "too high of a cost"
Should this mention that the byte version is still terrible here?


================
Comment at: lib/Target/X86/X86.td:301
           "ermsb", "HasERMSB", "true",
-          "REP MOVS/STOS are fast">;
+          "REP MOVSB/STOSB are as fast S/D/Q variants", [FeatureFastRepStrOps]>;
 
----------------
Was that supposed to be W/D/Q?


================
Comment at: lib/Target/X86/X86.td:547
   FeatureRDRAND,
+  FeatureERMSB,
   FeatureF16C,
----------------
Have you collected the data to verify this? Should we go ahead and commit this independent of this patch?


https://reviews.llvm.org/D35750





More information about the llvm-commits mailing list