[llvm] r184575 - Fix PR16360

Sun Jun 23 22:39:17 PDT 2013

Sorry, I don't why the system put that mail into junk box. I just check that about. In case you miss my reply. I put it here again.

Good point. I thought the original transformation is targeted to reduce overhead of shift in some scenarios when data types must be extended, e.g. 64-bit integer on 32-bit targets. It's painful to do that if all significant bits are required.

For 64-bit targets, reducing a 64-bit shift to a 32-bit one might have benefit as well. E.g., you may save one byte on x86-64 without encoding the W bit.

We could go even aggressively by folding (srl (anyextend x), c) to (zext (srl x, c)) to help generate efficient code on all targets as 'zext'
has better support. I verified it on x86-64 and ppc64 where the same code is generated before the bug is fixed. I checked aarch64 and mips64 as well but it seems those targets doesn't take the advantage that high bits are set to ZERO (I learned from public source that, with 32-bit instruction, on aarch64, the upper 32 bits of the destination register are set to ZERO and, on mips64, the upper 32 bits of the destination register will be sign extended and will be cleared for this case.) Please check the patch attached.

- michael

-----Original Message-----
From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Duncan Sands
Sent: Sunday, June 23, 2013 5:49 AM
To: llvm-commits at cs.uiuc.edu
Subject: Re: [llvm] r184575 - Fix PR16360

On 21/06/13 21:22, Nadav Rotem wrote:
> It is not clear to me why this transformation is profitable for all platforms.
> You are adding an AND operation here. Should this optimization be 
> target specific maybe ?

I made the same point and also mentioned a better way of generating the mask, but Michael seems to have ignored my email.

Ciao, Duncan.

>
> On Jun 21, 2013, at 11:45 AM, Michael Liao <michael.liao at intel.com 
> <mailto:michael.liao at intel.com>> wrote:
>
>> Author: hliao
>> Date: Fri Jun 21 13:45:27 2013
>> New Revision: 184575
>>
>> URL:http://llvm.org/viewvc/llvm-project?rev=184575&view=rev
>> Log:
>> Fix PR16360
>>
>> When (srl (anyextend x), c) is folded into (anyextend (srl x, c)), 
>> the high bits are not cleared. Add 'and' to clear off them.
>>
>>
>> Added:
>>    llvm/trunk/test/CodeGen/X86/pr16360.ll
>> Modified:
>>    llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>>
>> Modified: llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/Select
>> ionDAG/DAGCombiner.cpp?rev=184575&r1=184574&r2=184575&view=diff
>> =====================================================================
>> =========
>> --- llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (original)
>> +++ llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Fri Jun 21 
>> +++ 13:45:27 2013
>> @@ -3915,8 +3915,7 @@ SDValue DAGCombiner::visitSRL(SDNode *N)
>>                        DAG.getConstant(~0ULL >> ShAmt, VT));
>>   }
>>
>> -
>> -  // fold (srl (anyextend x), c) -> (anyextend (srl x, c))
>> +  // fold (srl (anyextend x), c) -> (and (anyextend (srl x, c)), 
>> + mask)
>>   if (N1C && N0.getOpcode() == ISD::ANY_EXTEND) {
>>     // Shifting in all undef bits?
>>     EVT SmallVT = N0.getOperand(0).getValueType(); @@ -3929,7 
>> +3928,10 @@ SDValue DAGCombiner::visitSRL(SDNode *N)
>>                                        N0.getOperand(0),
>>                           DAG.getConstant(ShiftAmt, getShiftAmountTy(SmallVT)));
>>       AddToWorkList(SmallShift.getNode());
>> -      return DAG.getNode(ISD::ANY_EXTEND, SDLoc(N), VT, SmallShift);
>> +      APInt Mask = APInt::getAllOnesValue(VT.getSizeInBits()).lshr(ShiftAmt);
>> +      return DAG.getNode(ISD::AND, SDLoc(N), VT,
>> +                         DAG.getNode(ISD::ANY_EXTEND, SDLoc(N), VT, SmallShift),
>> +                         DAG.getConstant(Mask, VT));
>>     }
>>   }
>>
>>
>> Added: llvm/trunk/test/CodeGen/X86/pr16360.ll
>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/p
>> r16360.ll?rev=184575&view=auto 
>> =====================================================================
>> =========
>> --- llvm/trunk/test/CodeGen/X86/pr16360.ll (added)
>> +++ llvm/trunk/test/CodeGen/X86/pr16360.ll Fri Jun 21 13:45:27 2013
>> @@ -0,0 +1,16 @@
>> +; RUN: llc < %s -mtriple=i686-pc-linux | FileCheck %s
>> +
>> +define i64 @foo(i32 %sum) {
>> +entry:
>> +  %conv = sext i32 %sum to i64
>> +  %shr = lshr i64 %conv, 2
>> +  %or = or i64 4611686018360279040, %shr
>> +  ret i64 %or
>> +}
>> +
>> +; CHECK: foo
>> +; CHECK: shrl $2
>> +; CHECK: orl $-67108864
>> +; CHECK-NOT: movl $-1
>> +; CHECK: movl $1073741823
>> +; CHECK: ret
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu> 
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>

_______________________________________________
llvm-commits mailing list
llvm-commits at cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Revise-fix-to-PR16360-for-efficient-code-on-64-bit-t.patch
Type: application/octet-stream
Size: 2250 bytes
Desc: 0001-Revise-fix-to-PR16360-for-efficient-code-on-64-bit-t.patch
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130624/62097762/attachment.obj>