[PATCH] D80485: [DAGCombiner][PowerPC][AArch64] Remove isMulhCheaperThanMulShift TLI hook. Use isOperationLegal directly instead.

Fri Oct 9 00:52:40 PDT 2020

craig.topper added a comment.

In D80485#2304761 <https://reviews.llvm.org/D80485#2304761>, @amyk wrote:

> In D80485#2293955 <https://reviews.llvm.org/D80485#2293955>, @craig.topper wrote:
>
>> In D80485#2293942 <https://reviews.llvm.org/D80485#2293942>, @amyk wrote:
>>
>>> So, my goal was to introduce a DAG combine to combine multiply+shifts into mulh. This is done in a function I introduced within `DAGCombiner.cpp` (called `combineShiftToMULH`).
>>>
>>> Since I was implementing something that is in target independent code, I thought it may be better to enable it on PowerPC only first since I was not quite sure if other targets were interested it this and I was seeing many other LIT failures at the time. 
>>> Thus, I introduced the hook (`isMulhCheaperThanMulShift`) and targets who wish to combine multiply+shifts into mulh could implement that hook. Within `combineShiftToMULH`, there is a check to `isMulhCheaperThanMulShift`.
>>>
>>> If we wanted to enable this for everyone, I think the check `isOperationLegalOrCustom` would probably be sufficient. I just tried this patch again, and made the following changes for `combineShiftToMULH`:
>>>
>>>   @@ -8099,12 +8101,6 @@ static SDValue combineShiftToMULH(SDNode *N, SelectionDAG &DAG,
>>>      if (NarrowVT !=  RightOp.getOperand(0).getValueType())
>>>        return SDValue();
>>>   
>>>   -  // Only transform into mulh if mulh for the narrow type is cheaper than
>>>   -  // a multiply followed by a shift. This should also check if mulh is
>>>   -  // legal for NarrowVT on the target.
>>>   -  if (!TLI.isMulhCheaperThanMulShift(NarrowVT))
>>>   -      return SDValue();
>>>   -
>>>      // Proceed with the transformation if the wide type is twice as large
>>>      // as the narrow type.
>>>      unsigned NarrowVTSize = NarrowVT.getScalarSizeInBits();
>>>   @@ -8122,6 +8118,12 @@ static SDValue combineShiftToMULH(SDNode *N, SelectionDAG &DAG,
>>>      // we use mulhs. Othewise, zero extends (zext) use mulhu.
>>>      unsigned MulhOpcode = IsSignExt ? ISD::MULHS : ISD::MULHU;
>>>   
>>>   +  // Only transform into mulh if mulh for the narrow type is cheaper than
>>>   +  // a multiply followed by a shift. This should also check if mulh is
>>>   +  // legal for NarrowVT on the target.
>>>   +  if (!TLI.isOperationLegalOrCustom(MulhOpcode, NarrowVT))
>>>   +      return SDValue();
>>>   +
>>>      SDValue Result = DAG.getNode(MulhOpcode, DL, NarrowVT, LeftOp.getOperand(0),
>>>                                   RightOp.getOperand(0));
>>>      return (N->getOpcode() == ISD::SRA ? DAG.getSExtOrTrunc(Result, DL, WideVT1)
>>>
>>> I only see one LIT failure now:
>>>
>>>   Failed Tests (1):
>>>     LLVM :: CodeGen/X86/pmulh.ll
>>
>> Would this now be enabled on PPC32 or whatever !PPC64 is called? Are there tests for that?
>
> I've looked into this and on 32-bit mode, the mulh nodes are actually only legalized for `MVT::i32` and was being produced previous to introducing my DAGCombine. 
> We actually already have a test in our backend (`llvm/test/CodeGen/PowerPC/mulhs.ll`) that shows i32 `mulhs` is being produced. And since the mulh is legal, my combine will run and produce `mulhs` in this test, as well. 
> Other types on 32-bit mode aren't legal, so the combine won't run on other types. Thus in terms of PPC, I think it's probably fine to remove `isMulhCheaperThanMulShift`.
>
> The only other failure I see is for `CodeGen/X86/pmulh.ll`. I have the following changes for the test case, however I think it would be great to have your input on it.
>
>   diff --git a/llvm/test/CodeGen/X86/pmulh.ll b/llvm/test/CodeGen/X86/pmulh.ll
>   index c03d0190714e..36d6137c9251 100644
>   --- a/llvm/test/CodeGen/X86/pmulh.ll
>   +++ b/llvm/test/CodeGen/X86/pmulh.ll
>   @@ -489,10 +489,11 @@ define <8 x i32> @mulhsw_v8i16_ashr(<8 x i16> %a, <8 x i16> %b) {
>    ; SSE2-LABEL: mulhsw_v8i16_ashr:
>    ; SSE2:       # %bb.0:
>    ; SSE2-NEXT:    pmulhw %xmm1, %xmm0
>   +; SSE2-NEXT:    punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3]
>   +; SSE2-NEXT:    psrad $16, %xmm2
>    ; SSE2-NEXT:    punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]
>   -; SSE2-NEXT:    punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]
>   -; SSE2-NEXT:    psrad $16, %xmm0
>    ; SSE2-NEXT:    psrad $16, %xmm1
>   +; SSE2-NEXT:    movdqa %xmm2, %xmm0
>    ; SSE2-NEXT:    retq
>    ;
>    ; SSE41-LABEL: mulhsw_v8i16_ashr:
>
> Also, would you like me to take over this revision, update it to get it reviewed/committed?

You can take over the review. The mulhsw_v8i16_ashr change is a bit unfortunate. It looks like the punpck is ending up with an undef input that is getting assigned to an abitrary register during register allocation and introducing a false dependency. I think we're just getting lucky with how it was being scheduled before. So I think its fine.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80485/new/

https://reviews.llvm.org/D80485