[llvm] [X86] For inline memset and memcpy with minsize, use size for alignment, rather than actual alignment (PR #87003)

Thu Jul 18 11:13:36 PDT 2024

================
@@ -62,18 +62,31 @@ SDValue X86SelectionDAGInfo::EmitTargetCodeForMemset(
   const X86Subtarget &Subtarget =
       DAG.getMachineFunction().getSubtarget<X86Subtarget>();
 
+  // If we have minsize, then don't care about the alignment.
+  // On x86, the CPU doesn't care and neither should you.
+  // As long as the count is aligned, we can use the minimum number of
+  // instructions without always having to resort to stosb.
+  //
+  // Because this is a feature specific to x86, we must handle it here.
----------------
goldsteinn wrote:

Yes.
```
       0: f3 aa                        	rep		stosb	%al, %es:(%rdi)
       2: f3 66 ab                     	rep		stosw	%ax, %es:(%rdi)
       5: f3 ab                        	rep		stosd	%eax, %es:(%rdi)
       7: f3 48 ab                     	rep		stosq	%rax, %es:(%rdi)
```
Further if you have minsize to shrink the cost of encoding the `imm` (especially problematic for `imm64`). Further things the ERMS only really require that the byte-variant be fast (although every processor I'm aware of with ERMS also has fast 2/4/8 byte variants).



https://github.com/llvm/llvm-project/pull/87003