[PATCH] D55365: [CodeGen] Allow mempcy/memset to generate small overlapping stores.

Sat Dec 8 07:46:27 PST 2018

spatel added a reviewer: RKSimon.
spatel added inline comments.

================
Comment at: test/CodeGen/X86/memset-zero.ll:314
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT:    movb $0, 34(%eax)
-; X86-NEXT:    movw $0, 32(%eax)
+; X86-NEXT:    movl $0, 31(%eax)
 ; X86-NEXT:    movl $0, 28(%eax)
----------------
pcordes wrote:
> There's a code-size vs. uop count tradeoff here.  Zeroing one register with a 2-byte `xor %edx,%edx` would save 4 bytes in each of following `movl $imm32` instructions.
> 
> Especially on CPUs without a uop-cache, it may well be a win to have one extra cheap uop go though the pipeline to avoid decode bottlenecks that might limit how far ahead the CPU can "see" in the instruction stream.
> 
See PR24448: https://bugs.llvm.org/show_bug.cgi?id=24448
...and the related bugs.
I'm still not sure how to solve that, so the bug has been sitting for a long time.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55365/new/

https://reviews.llvm.org/D55365