[PATCH] D21774: [X86] Transform setcc + movzbl into xorl + setcc

Thu Jun 30 11:09:18 PDT 2016

mkuper added a comment.

In http://reviews.llvm.org/D21774#471231, @ab wrote:

> Eh, if Dave has no objections, I'm also fine with the pass;  I was hoping we'd find a better solution, sorry for the noise!


It's not noise at all, I was - and still am - hoping we'd find a better solution as well.

> The pseudo does seem like the least brittle approach; but the pre-RA pass seems good enough?


I have two main concerns about pseudos:

1. Opaqueness: the pseudos are at least somewhat opaque to machine IR passes. I'm not sure how much of a concern this really is in practice, but it bothers me.
2. Cleanliness: this will mean introducing ~15 new pseudos (or a pseudo with a parametric CC code that only gets resolved in expand-time, which I think is even worse, since it'll then look very different from the normal x86 setcc.)

But if "public opinion" is strongly towards pseudos, I can go with that. Dave has a prototype patch, I'll see if I run into any gotchas with it.

> So... have you considered doing it post-RA?  Seems like that lets you avoid the PCMPESTR problem, and I can't think of obvious drawbacks.  It's a tad trickier but should be very close to this patch.


What bothers me about post-RA is cases when the flags-setting instruction and the setcc use the same register, e.g.

  testb %al, %al
  seta %al
  movzbl %al, %eax

It may be the case that this never has a read stall on the seta, because if there were, then we'll always stall on the the instruction that defs the flag (or on some instruction between the test and the setcc, in more complicated cases). I haven't been able to convince myself of this, though.

> And that's how we get the worst-case sequence.

>  So, we thought it was fine if it was more likely only on 32-bit, but now we have one 64-bit example in PR28146.  I'd be interested to investigate that too, but that's only for curiosity; let's forget about the ISel approach.


Yes, I'm curious about it too. I'll try to take a look at it separately and see if I can make sense of it. But, I'm, by a long shot, not as familiar with regalloc as I'd like...


================
Comment at: test/CodeGen/X86/avx-intrinsics-x86.ll:1946-1953
@@ -1945,5 +1945,10 @@
 
 define i32 @test_x86_sse42_pcmpestria128(<16 x i8> %a0, <16 x i8> %a2) {
 ; AVX-LABEL: test_x86_sse42_pcmpestria128:
 ; AVX:       ## BB#0:
+; AVX-NEXT:    pushl %ebx
+; AVX-NEXT:  Ltmp0:
+; AVX-NEXT:    .cfi_def_cfa_offset 8
+; AVX-NEXT:  Ltmp1:
+; AVX-NEXT:    .cfi_offset %ebx, -8
 ; AVX-NEXT:    movl $7, %eax
----------------
spatel wrote:
> Use the 'nounwind' attribute on this and other tests to eliminate some of the diff noise?
Sure, will do.


http://reviews.llvm.org/D21774