[PATCH] D46528: [DAGCombine][X86][AArch64] Masked merge unfolding: vector edition.

Wed May 9 15:53:19 PDT 2018

spatel added inline comments.

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.h:444

-  bool hasAndNotCompare(SDValue) const override {
-    // 'bics'
-    return true;
+  bool hasAndNotCompare(SDValue Y) const override {
+    EVT VT = Y.getValueType();
----------------
lebedev.ri wrote:
> spatel wrote:
> > The name of the function is overly specific if we're including bsl and instructions too. I think we should either generalize that name or add a different hook for "hasBitwiseSelect".
> Currently for AArch64, `hasAndNot()` is not overriden.
> We could rename it to that, and make `hasAndNotCompare()` call `hasAndNot()`.
> 
> Or is this about these hooks in general?
It's a combination:
1. It's not accurate to say here (or in x86) that we have AndNotCompare for vector types. That hook is supposed to indicate that the logic instruction also sets flags (bics vs. bic). It's not causing any problems right now AFAICT, but stretching the truth of that logic could cause trouble some day.

2. bsl (or vpcmov for x86) are not and-not operations either, so some transform may be misled by us claiming those are and-not ops. 

For AArch, we have vector bic. So here, I think we should override hasAndNot() and indicate that we have bic for scalar and bic for vector. It's really just a bonus for the masked-merge transform that we can shrink it down to bsl.

Similarly for x86, what we care about is that we can use andnps, and producing vpcmov (who knew) is a bonus. So we want to fix the override of hasAndNot() to indicate that legal vector types are supported with the right dose of SSE.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:4773-4776
+  if (VT == MVT::v4i32)
+    return true;
+
+  return Subtarget.hasSSE2();
----------------
lebedev.ri wrote:
> spatel wrote:
> > That seems odd. Does something bad happen with other types of 128-bit vectors (eg v8i16) with SSE1 only? I would've thought all element types would get mapped to andnps.
> I'm pretty sure that with SSE1 we only get this vector size.
> https://godbolt.org/g/o5CGQt
> 
> https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=SSE,SSE2&text=andn
> http://www.felixcloutier.com/x86/ANDNPS.html
> http://www.felixcloutier.com/x86/ANDNPD.html
Hmm...the other types seem to get legalized via scalarization before we have a chance to transform them into the x86-specific logic nodes. 

Probably nobody cares about pre-SSE2 codegen at this point, but Craig or Simon may have suggestions about making the tests less noisy in that respect.

Repository:
  rL LLVM

https://reviews.llvm.org/D46528