[PATCH] D55935: [X86][SSE] Canonicalize OR(AND(X, C), AND(Y, ~C)) -> OR(AND(X, C), ANDNP(C, Y))

Mon Jan 7 09:05:15 PST 2019

spatel added a comment.

In D55935#1348305 <https://reviews.llvm.org/D55935#1348305>, @RKSimon wrote:

> In D55935#1340459 <https://reviews.llvm.org/D55935#1340459>, @spatel wrote:
>
> > Would we better off doing a generic DAGCombine into the optimal xor-and-xor (aka, masked merge) pattern?
> >  (x & C) | (y & ~C) --> ((x ^ y) & C) ^ y --> ((y ^ x) & ~C) ^ x
>
>
> So do we agree that adding support for this additional pattern only makes sense if one of X or Y is being loaded (once)? It doesn't seem to matter if C is being reused or not.

I'm not seeing where loading X/Y is the deciding factor. We have something like this currently:

  andps {{.*}}(%rip), %xmm1  // load constant
  andps {{.*}}(%rip), %xmm2  // load inverse constant
  orps  %xmm2, %xmm1

This would become with this patch:

  movaps  {{.*}}(%rip), %xmm0 // load constant for 2 uses
  andps   %xmm0, %xmm1
  andnps  %xmm0, %xmm2
  orps    %xmm2, %xmm1

With the masked-merge transform it would be:

  xorps  %xmm1, %xmm2
  andps  {{.*}}(%rip), %xmm2 // load fold constant for 1 use
  xorps  %xmm2, %xmm1

It's 4 uops for either of the last 2 alternatives. The trade-off (which I think is impossible to model statically) is whether we're better off with the potentially better thoughput of the sequence with 'andnps' or the potentially shorter (load-folded) sequence with the dependent logic ops. Either way is a uop improvement over the current codegen, so I'm not opposed to this patch, but the generic masked-merge transform would probably be a smaller patch?

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55935/new/

https://reviews.llvm.org/D55935