[llvm] [X86] For inline memset and memcpy with minsize, use size for alignment, rather than actual alignment (PR #87003)
via llvm-commits
llvm-commits at lists.llvm.org
Thu Jul 18 11:13:36 PDT 2024
================
@@ -62,18 +62,31 @@ SDValue X86SelectionDAGInfo::EmitTargetCodeForMemset(
const X86Subtarget &Subtarget =
DAG.getMachineFunction().getSubtarget<X86Subtarget>();
+ // If we have minsize, then don't care about the alignment.
+ // On x86, the CPU doesn't care and neither should you.
+ // As long as the count is aligned, we can use the minimum number of
+ // instructions without always having to resort to stosb.
+ //
+ // Because this is a feature specific to x86, we must handle it here.
----------------
goldsteinn wrote:
Yes.
```
0: f3 aa rep stosb %al, %es:(%rdi)
2: f3 66 ab rep stosw %ax, %es:(%rdi)
5: f3 ab rep stosd %eax, %es:(%rdi)
7: f3 48 ab rep stosq %rax, %es:(%rdi)
```
Further if you have minsize to shrink the cost of encoding the `imm` (especially problematic for `imm64`). Further things the ERMS only really require that the byte-variant be fast (although every processor I'm aware of with ERMS also has fast 2/4/8 byte variants).
https://github.com/llvm/llvm-project/pull/87003
More information about the llvm-commits
mailing list