[PATCH] D55365: [CodeGen] Allow mempcy/memset to generate small overlapping stores.
Sanjay Patel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sat Dec 8 07:46:27 PST 2018
spatel added a reviewer: RKSimon.
spatel added inline comments.
================
Comment at: test/CodeGen/X86/memset-zero.ll:314
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: movb $0, 34(%eax)
-; X86-NEXT: movw $0, 32(%eax)
+; X86-NEXT: movl $0, 31(%eax)
; X86-NEXT: movl $0, 28(%eax)
----------------
pcordes wrote:
> There's a code-size vs. uop count tradeoff here. Zeroing one register with a 2-byte `xor %edx,%edx` would save 4 bytes in each of following `movl $imm32` instructions.
>
> Especially on CPUs without a uop-cache, it may well be a win to have one extra cheap uop go though the pipeline to avoid decode bottlenecks that might limit how far ahead the CPU can "see" in the instruction stream.
>
See PR24448: https://bugs.llvm.org/show_bug.cgi?id=24448
...and the related bugs.
I'm still not sure how to solve that, so the bug has been sitting for a long time.
Repository:
rL LLVM
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D55365/new/
https://reviews.llvm.org/D55365
More information about the llvm-commits
mailing list