[PATCH] D46494: [DAGCombiner] Masked merge: enhance handling of 'andn' with immediates

Mon May 7 07:38:35 PDT 2018

spatel added a subscriber: andreadb.
spatel added a comment.

In https://reviews.llvm.org/D46494#1089380, @lebedev.ri wrote:

> Fixed with the correct fold, updated mca diffs in the differential's description:
>  F6120274: trunk-vs-patch.txt <https://reviews.llvm.org/F6120274> F6120277: prevpatch-vs-patch.txt <https://reviews.llvm.org/F6120277>

We need to be careful here (and maybe there's a way for mca to show/warn about this, cc @andreadb).

When you simulate these instructions:

  andnl %edx, %edi, %eax
  orl $42, %edx
  andnl %edx, %eax, %eax

Notice that the output of the sequence (%eax) is not used again by any instruction in the sequence. So this is measuring ideal throughput in a vacuum - each simulated iteration proceeds independently. Maybe that's what you intended, but the original sequence that you're comparing does not have that property:

  andl	%edx, %edi
  notl	%edx
  andl	$42, %edx
  orl	%edi, %edx    <--- output fed back as the input to first instruction 

Each iteration depends on the previous one, so it's not fair to compare the stats for the 2 sequences as they're shown in the attached diff.

================
Comment at: test/CodeGen/X86/unfold-masked-merge-scalar-variablemask.ll:707-710
   %notmask = xor i32 %mask, -1
   %n0 = xor i32 %x, 42 ; %x
   %n1 = and i32 %n0, %notmask
   %r = xor i32 %n1, 42
----------------
I'd still prefer to put a comment on tests like this where we're testing a non-canonical form, so we have a reminder that we're testing this for completeness, but it's not expected. Ie, instcombine would improve this sequence to remove the inverted mask.

Repository:
  rL LLVM

https://reviews.llvm.org/D46494