[PATCH] D80485: [DAGCombiner][PowerPC][AArch64] Remove isMulhCheaperThanMulShift TLI hook. Use isOperationLegal directly instead.

Wed Sep 30 14:52:45 PDT 2020

amyk added a comment.

In D80485#2293955 <https://reviews.llvm.org/D80485#2293955>, @craig.topper wrote:

> In D80485#2293942 <https://reviews.llvm.org/D80485#2293942>, @amyk wrote:
>
>> So, my goal was to introduce a DAG combine to combine multiply+shifts into mulh. This is done in a function I introduced within `DAGCombiner.cpp` (called `combineShiftToMULH`).
>>
>> Since I was implementing something that is in target independent code, I thought it may be better to enable it on PowerPC only first since I was not quite sure if other targets were interested it this and I was seeing many other LIT failures at the time. 
>> Thus, I introduced the hook (`isMulhCheaperThanMulShift`) and targets who wish to combine multiply+shifts into mulh could implement that hook. Within `combineShiftToMULH`, there is a check to `isMulhCheaperThanMulShift`.
>>
>> If we wanted to enable this for everyone, I think the check `isOperationLegalOrCustom` would probably be sufficient. I just tried this patch again, and made the following changes for `combineShiftToMULH`:
>>
>>   @@ -8099,12 +8101,6 @@ static SDValue combineShiftToMULH(SDNode *N, SelectionDAG &DAG,
>>      if (NarrowVT !=  RightOp.getOperand(0).getValueType())
>>        return SDValue();
>>   
>>   -  // Only transform into mulh if mulh for the narrow type is cheaper than
>>   -  // a multiply followed by a shift. This should also check if mulh is
>>   -  // legal for NarrowVT on the target.
>>   -  if (!TLI.isMulhCheaperThanMulShift(NarrowVT))
>>   -      return SDValue();
>>   -
>>      // Proceed with the transformation if the wide type is twice as large
>>      // as the narrow type.
>>      unsigned NarrowVTSize = NarrowVT.getScalarSizeInBits();
>>   @@ -8122,6 +8118,12 @@ static SDValue combineShiftToMULH(SDNode *N, SelectionDAG &DAG,
>>      // we use mulhs. Othewise, zero extends (zext) use mulhu.
>>      unsigned MulhOpcode = IsSignExt ? ISD::MULHS : ISD::MULHU;
>>   
>>   +  // Only transform into mulh if mulh for the narrow type is cheaper than
>>   +  // a multiply followed by a shift. This should also check if mulh is
>>   +  // legal for NarrowVT on the target.
>>   +  if (!TLI.isOperationLegalOrCustom(MulhOpcode, NarrowVT))
>>   +      return SDValue();
>>   +
>>      SDValue Result = DAG.getNode(MulhOpcode, DL, NarrowVT, LeftOp.getOperand(0),
>>                                   RightOp.getOperand(0));
>>      return (N->getOpcode() == ISD::SRA ? DAG.getSExtOrTrunc(Result, DL, WideVT1)
>>
>> I only see one LIT failure now:
>>
>>   Failed Tests (1):
>>     LLVM :: CodeGen/X86/pmulh.ll
>
> Would this now be enabled on PPC32 or whatever !PPC64 is called? Are there tests for that?

I've looked into this and on 32-bit mode, the mulh nodes are actually only legalized for `MVT::i32` and was being produced previous to introducing my DAGCombine. 
We actually already have a test in our backend (`llvm/test/CodeGen/PowerPC/mulhs.ll`) that shows i32 `mulhs` is being produced. And since the mulh is legal, my combine will run and produce `mulhs` in this test, as well. 
Other types on 32-bit mode aren't legal, so the combine won't run on other types. Thus in terms of PPC, I think it's probably fine to remove `isMulhCheaperThanMulShift`.

The only other failure I see is for `CodeGen/X86/pmulh.ll`. I have the following changes for the test case, however I think it would be great to have your input on it.

  diff --git a/llvm/test/CodeGen/X86/pmulh.ll b/llvm/test/CodeGen/X86/pmulh.ll
  index c03d0190714e..36d6137c9251 100644
  --- a/llvm/test/CodeGen/X86/pmulh.ll
  +++ b/llvm/test/CodeGen/X86/pmulh.ll
  @@ -489,10 +489,11 @@ define <8 x i32> @mulhsw_v8i16_ashr(<8 x i16> %a, <8 x i16> %b) {
   ; SSE2-LABEL: mulhsw_v8i16_ashr:
   ; SSE2:       # %bb.0:
   ; SSE2-NEXT:    pmulhw %xmm1, %xmm0
  +; SSE2-NEXT:    punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3]
  +; SSE2-NEXT:    psrad $16, %xmm2
   ; SSE2-NEXT:    punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]
  -; SSE2-NEXT:    punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]
  -; SSE2-NEXT:    psrad $16, %xmm0
   ; SSE2-NEXT:    psrad $16, %xmm1
  +; SSE2-NEXT:    movdqa %xmm2, %xmm0
   ; SSE2-NEXT:    retq
   ;
   ; SSE41-LABEL: mulhsw_v8i16_ashr:

Also, would you like me to take over this revision, update it to get it reviewed/committed?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80485/new/

https://reviews.llvm.org/D80485