[PATCH] D80485: [DAGCombiner][PowerPC][AArch64] Remove isMulhCheaperThanMulShift TLI hook. Use isOperationLegal directly instead.
Craig Topper via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Oct 9 00:52:40 PDT 2020
craig.topper added a comment.
In D80485#2304761 <https://reviews.llvm.org/D80485#2304761>, @amyk wrote:
> In D80485#2293955 <https://reviews.llvm.org/D80485#2293955>, @craig.topper wrote:
>
>> In D80485#2293942 <https://reviews.llvm.org/D80485#2293942>, @amyk wrote:
>>
>>> So, my goal was to introduce a DAG combine to combine multiply+shifts into mulh. This is done in a function I introduced within `DAGCombiner.cpp` (called `combineShiftToMULH`).
>>>
>>> Since I was implementing something that is in target independent code, I thought it may be better to enable it on PowerPC only first since I was not quite sure if other targets were interested it this and I was seeing many other LIT failures at the time.
>>> Thus, I introduced the hook (`isMulhCheaperThanMulShift`) and targets who wish to combine multiply+shifts into mulh could implement that hook. Within `combineShiftToMULH`, there is a check to `isMulhCheaperThanMulShift`.
>>>
>>> If we wanted to enable this for everyone, I think the check `isOperationLegalOrCustom` would probably be sufficient. I just tried this patch again, and made the following changes for `combineShiftToMULH`:
>>>
>>> @@ -8099,12 +8101,6 @@ static SDValue combineShiftToMULH(SDNode *N, SelectionDAG &DAG,
>>> if (NarrowVT != RightOp.getOperand(0).getValueType())
>>> return SDValue();
>>>
>>> - // Only transform into mulh if mulh for the narrow type is cheaper than
>>> - // a multiply followed by a shift. This should also check if mulh is
>>> - // legal for NarrowVT on the target.
>>> - if (!TLI.isMulhCheaperThanMulShift(NarrowVT))
>>> - return SDValue();
>>> -
>>> // Proceed with the transformation if the wide type is twice as large
>>> // as the narrow type.
>>> unsigned NarrowVTSize = NarrowVT.getScalarSizeInBits();
>>> @@ -8122,6 +8118,12 @@ static SDValue combineShiftToMULH(SDNode *N, SelectionDAG &DAG,
>>> // we use mulhs. Othewise, zero extends (zext) use mulhu.
>>> unsigned MulhOpcode = IsSignExt ? ISD::MULHS : ISD::MULHU;
>>>
>>> + // Only transform into mulh if mulh for the narrow type is cheaper than
>>> + // a multiply followed by a shift. This should also check if mulh is
>>> + // legal for NarrowVT on the target.
>>> + if (!TLI.isOperationLegalOrCustom(MulhOpcode, NarrowVT))
>>> + return SDValue();
>>> +
>>> SDValue Result = DAG.getNode(MulhOpcode, DL, NarrowVT, LeftOp.getOperand(0),
>>> RightOp.getOperand(0));
>>> return (N->getOpcode() == ISD::SRA ? DAG.getSExtOrTrunc(Result, DL, WideVT1)
>>>
>>> I only see one LIT failure now:
>>>
>>> Failed Tests (1):
>>> LLVM :: CodeGen/X86/pmulh.ll
>>
>> Would this now be enabled on PPC32 or whatever !PPC64 is called? Are there tests for that?
>
> I've looked into this and on 32-bit mode, the mulh nodes are actually only legalized for `MVT::i32` and was being produced previous to introducing my DAGCombine.
> We actually already have a test in our backend (`llvm/test/CodeGen/PowerPC/mulhs.ll`) that shows i32 `mulhs` is being produced. And since the mulh is legal, my combine will run and produce `mulhs` in this test, as well.
> Other types on 32-bit mode aren't legal, so the combine won't run on other types. Thus in terms of PPC, I think it's probably fine to remove `isMulhCheaperThanMulShift`.
>
> The only other failure I see is for `CodeGen/X86/pmulh.ll`. I have the following changes for the test case, however I think it would be great to have your input on it.
>
> diff --git a/llvm/test/CodeGen/X86/pmulh.ll b/llvm/test/CodeGen/X86/pmulh.ll
> index c03d0190714e..36d6137c9251 100644
> --- a/llvm/test/CodeGen/X86/pmulh.ll
> +++ b/llvm/test/CodeGen/X86/pmulh.ll
> @@ -489,10 +489,11 @@ define <8 x i32> @mulhsw_v8i16_ashr(<8 x i16> %a, <8 x i16> %b) {
> ; SSE2-LABEL: mulhsw_v8i16_ashr:
> ; SSE2: # %bb.0:
> ; SSE2-NEXT: pmulhw %xmm1, %xmm0
> +; SSE2-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3]
> +; SSE2-NEXT: psrad $16, %xmm2
> ; SSE2-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]
> -; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]
> -; SSE2-NEXT: psrad $16, %xmm0
> ; SSE2-NEXT: psrad $16, %xmm1
> +; SSE2-NEXT: movdqa %xmm2, %xmm0
> ; SSE2-NEXT: retq
> ;
> ; SSE41-LABEL: mulhsw_v8i16_ashr:
>
> Also, would you like me to take over this revision, update it to get it reviewed/committed?
You can take over the review. The mulhsw_v8i16_ashr change is a bit unfortunate. It looks like the punpck is ending up with an undef input that is getting assigned to an abitrary register during register allocation and introducing a false dependency. I think we're just getting lucky with how it was being scheduled before. So I think its fine.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D80485/new/
https://reviews.llvm.org/D80485
More information about the llvm-commits
mailing list