[PATCH] D112754: X86: Fold masked merge pattern when and-not is not available

Wed Nov 3 16:28:49 PDT 2021

MatzeB marked an inline comment as done.
MatzeB added inline comments.

================
Comment at: llvm/include/llvm/CodeGen/TargetLowering.h:4621
+  /// `TargetLowering::PerformDAGCombing` callback on `ISD::OR` nodes.
+  SDValue foldMaskedMerge(SDNode *Node, SelectionDAG &DAG) const;
+
----------------
RKSimon wrote:
> Based on the regressions you saw on other targets, do you have any thoughts on whether any other targets will be able to use this? Otherwise it might make sense to move this into X86ISelLowering.cpp until there's a need to make this generic.
It mostly means other targets have to implement `TargetInstrInfo::hasNot` properly and then adapt ISel patterns to still trigger where necessary.

I have no immediate plans to enable the code for other targets. ARM, AArch64, PowerPC feature an and-not instruction anyway and don't need this; It may help targets like RISCV, WebAssembly, BCC at a first glance, but I guess we leave that for the respective target authors to discover then.

Ok, I'll move the code to X86ISelLowering.cpp we can always move it back to a shared space when a 2nd target starts using it.

================
Comment at: llvm/test/CodeGen/X86/fold-masked-merge.ll:33-39
 ; CHECK-NEXT:    movl %edi, %eax
 ; CHECK-NEXT:    andl %edi, %esi
 ; CHECK-NEXT:    notl %eax
 ; CHECK-NEXT:    andl %edx, %eax
 ; CHECK-NEXT:    orl %esi, %eax
 ; CHECK-NEXT:    # kill: def $ax killed $ax killed $eax
 ; CHECK-NEXT:    retq
----------------
Note that this version currently fails because SelectionDAG does not seem to consistently move all operations to i16:

```
  t0: ch = EntryToken
  t2: i32,ch = CopyFromReg t0, Register:i32 %0
          t5: i32,ch = CopyFromReg t0, Register:i32 %1
        t19: i32 = and t2, t5
          t8: i32,ch = CopyFromReg t0, Register:i32 %2
              t3: i16 = truncate t2
            t12: i16 = xor t3, Constant:i16<-1>
          t24: i32 = any_extend t12
        t25: i32 = and t8, t24
      t22: i32 = or t19, t25
    t23: i16 = truncate t22
  t17: ch,glue = CopyToReg t0, Register:i16 $ax, t23
  t18: ch = X86ISD::RET_FLAG t17, TargetConstant:i32<0>, Register:i16 $ax, t17:1
```
(note the stray `t24: i32 = any_extend t12`...)

I'll leave the test here, but fixing this is outside the scope of this diff. 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112754/new/

https://reviews.llvm.org/D112754