[PATCH] D35750: [x86] Teach the x86 backend about general fast rep+movs and rep+stos features of modern x86 CPUs, and use this feature to drastically reduce the number of places we actually emit memset and memcpy library calls.

Sat Jul 22 15:18:22 PDT 2017

craig.topper added a comment.

I believe prior to Ivy Bridge the Intel optimization manual indicates that rep movsb/stosb was only optimized to handle 1-3 bytes. Specifically to be used to handle the remainder portion in conjunction with rep+movsd to handle the rest.

================
Comment at: lib/Target/X86/X86.td:287
+// lowerings for string operations when calling the library function would have
+// too high of cost.
+def FeatureFastRepStrOps
----------------
echristo wrote:
> "too high of a cost"
Should this mention that the byte version is still terrible here?

================
Comment at: lib/Target/X86/X86.td:301
           "ermsb", "HasERMSB", "true",
-          "REP MOVS/STOS are fast">;
+          "REP MOVSB/STOSB are as fast S/D/Q variants", [FeatureFastRepStrOps]>;

----------------
Was that supposed to be W/D/Q?

================
Comment at: lib/Target/X86/X86.td:547
   FeatureRDRAND,
+  FeatureERMSB,
   FeatureF16C,
----------------
Have you collected the data to verify this? Should we go ahead and commit this independent of this patch?

https://reviews.llvm.org/D35750