[PATCH] D21774: [X86] Transform setcc + movzbl into xorl + setcc

Thu Jun 30 15:58:37 PDT 2016

DavidKreitzer added a comment.

I have no objection to this solution. I think it is robust in a functional sense. And we can always change it later if we discover that the worst case MOV32r0 sinking scenario is more common than we think. It would give me the warm fuzzies if we had some experimental evidence to confirm the suspicion that this is a rare case. Do you already have that? If not, maybe it would be a good idea to write a late pass that looks for this kind of pattern and count the number of occurrences on, say, cpu2006?

  SETcc r8
  xor r32, r32 (or mov $0, r32)
  movb r32b, r8

And thanks for the performance data! Once this gets committed, I'll have someone run testing on a broader set of workloads.


================
Comment at: test/CodeGen/X86/fp128-compare.ll:11
@@ -10,3 +10,3 @@
 ; CHECK:       callq __gttf2
-; CHECK:       setg  %al
-; CHECK:       movzbl %al, %eax
+; CHECK:       xorl  %ecx, %ecx
+; CHECK:       setg  %cl
----------------
mkuper wrote:
> DavidKreitzer wrote:
> > This looks like a problem ...
> It's the same issue as pcmpestr - we are constrained because both the input of the instruction defining eflags, and the eventual output of the setcc must be eax.
> 
> The full code is:
> 
> ```
> # BB#0:                                 # %entry
> 	pushq	%rax
> .Ltmp1:
> 	.cfi_def_cfa_offset 16
> 	callq	__getf2
> 	xorl	%ecx, %ecx
> 	testl	%eax, %eax
> 	setns	%cl
> 	movl	%ecx, %eax
> 	popq	%rcx
> 	retq
> 
> ```
> I should regenerate the test with the update script.
Ah yes, of course. You can ignore my comment.


http://reviews.llvm.org/D21774