[llvm-dev] rotl: undocumented LLVM instruction?

Ryan Taylor via llvm-dev llvm-dev at lists.llvm.org
Thu Nov 3 14:20:12 PDT 2016


Setting the ISD::ROTL to Expand doesn't work? (via SetOperation)

You could also do a Custom hook if that's what you're looking for.

On Thu, Nov 3, 2016 at 5:12 PM, Phil Tomson <phil.a.tomson at gmail.com> wrote:

> ... or perhaps to rephrase:
>
> In 3.9 it seems to be doing a smaller combine much sooner, whereas in 3.6
> it deferred that till later in the instruction selection pattern matching -
> the latter was giving us better results because it seems to match a larger
> pattern than the former did in the earlier stage.
>
> Phil
>
> On Thu, Nov 3, 2016 at 2:07 PM, Phil Tomson <phil.a.tomson at gmail.com>
> wrote:
>
>> Is there any way to get it to delay this optimization where it goes from
>> this:
>>
>> Initial selection DAG: BB#0 'bclr64:entry'
>> SelectionDAG has 14 nodes:
>>   t0: ch = EntryToken
>>       t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
>>             t4: i64,ch = CopyFromReg t0, Register:i64 %vreg1
>>           t6: i64 = sub t4, Constant:i64<1>
>>         t7: i64 = shl Constant:i64<1>, t6
>>       t9: i64 = xor t7, Constant:i64<-1>
>>     t10: i64 = and t2, t9
>>   t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
>>   t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>>
>>
>>
>> Combining: t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>>
>> Combining: t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
>>
>> Combining: t11: i64 = Register %R1
>>
>> Combining: t10: i64 = and t2, t9
>>
>> Combining: t9: i64 = xor t7, Constant:i64<-1>
>>  ... into: t15: i64 = rotl Constant:i64<-2>, t6
>>
>> ...to this:
>>
>> Optimized lowered selection DAG: BB#0 'bclr64:entry'
>> SelectionDAG has 13 nodes:
>>   t0: ch = EntryToken
>>       t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
>>           t4: i64,ch = CopyFromReg t0, Register:i64 %vreg1
>>         t17: i64 = add t4, Constant:i64<-1>
>>       t15: i64 = rotl Constant:i64<-2>, t17
>>     t10: i64 = and t2, t15
>>   t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
>>   t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>>
>>
>> That combining of the xor & and there ends up giving us suboptimal
>> results as compared with 3.6.
>>
>> For example, in 3.6 the generated code is simply:
>>
>> bclr64:                                 # @bclr64
>> # BB#0:                                 # %entry
>>     addI    r1, r1, -1, 64
>>     bclr        r1, r0, r1, 64
>>     jabs        r511
>>
>> Whereas with 3.9 the generated code is:
>>
>> bclr64:                                 # @bclr64
>> # BB#0:                                 # %entry
>>     addI    r1, r1, -1, 64
>>     movimm        r2, -2, 64
>>     rol        r1, r2, r1, 64
>>     bitop1        r1, r0, r1, AND, 64
>>     jabs        r511
>>
>>
>> ... it seems to be negatively impacting some of our larger benchmarks as
>> well that used to contains several bclr (bit clear) commands but now
>> contain much less.
>>
>> Phil
>>
>>
>>
>>
>> On Wed, Nov 2, 2016 at 4:10 PM, Ryan Taylor <ryta1203 at gmail.com> wrote:
>>
>>> I believe some of the ISDs were introduced to allow for DAG
>>> optimizations under the assumption that some of the major architectures
>>> directly support these types of instructions.
>>>
>>> -Ryan
>>>
>>> On Wed, Nov 2, 2016 at 6:24 PM, Phil Tomson via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> We've recently moved our project from LLVM 3.6 to LLVM 3.9.  I noticed
>>>> one of our code generation tests is breaking in 3.9.
>>>>
>>>> The test is:
>>>>
>>>>  ; RUN: llc < %s -march=xstg | FileCheck %s
>>>>
>>>> define i64 @bclr64(i64 %a, i64 %b) nounwind readnone {
>>>> entry:
>>>> ; CHECK: bclr     r1, r0, r1, 64
>>>>   %sub = sub i64 %b, 1
>>>>   %shl = shl i64 1, %sub
>>>>   %xor = xor i64 %shl, -1
>>>>   %and = and i64 %a, %xor
>>>>   ret i64 %and
>>>> }
>>>>
>>>> I ran llc with -debug to get a better idea of what's going on and found:
>>>>
>>>> Initial selection DAG: BB#0 'bclr64:entry'
>>>> SelectionDAG has 14 nodes:
>>>>   t0: ch = EntryToken
>>>>       t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
>>>>             t4: i64,ch = CopyFromReg t0, Register:i64 %vreg1
>>>>           t6: i64 = sub t4, Constant:i64<1>
>>>>         t7: i64 = shl Constant:i64<1>, t6
>>>>       t9: i64 = xor t7, Constant:i64<-1>
>>>>     t10: i64 = and t2, t9
>>>>   t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
>>>>   t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>>>>
>>>>
>>>>
>>>> Combining: t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>>>>
>>>> Combining: t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
>>>>
>>>> Combining: t11: i64 = Register %R1
>>>>
>>>> Combining: t10: i64 = and t2, t9
>>>>
>>>> Combining: t9: i64 = xor t7, Constant:i64<-1>
>>>>  ... into: t15: i64 = rotl Constant:i64<-2>, t6
>>>>
>>>> Combining: t10: i64 = and t2, t15
>>>>
>>>> Combining: t15: i64 = rotl Constant:i64<-2>, t6
>>>>
>>>> Combining: t14: i64 = Constant<-2>
>>>>
>>>> Combining: t6: i64 = sub t4, Constant:i64<1>
>>>>  ... into: t17: i64 = add t4, Constant:i64<-1>
>>>>
>>>> Combining: t15: i64 = rotl Constant:i64<-2>, t17
>>>>
>>>>
>>>>
>>>> These rotl instructions weren't showing up when I ran llc 3.6 and
>>>> that's completely changing the generated code at the end which means the
>>>> test fails (and it's less optimal than it was in 3.6).
>>>>
>>>> I've been looking in the LLVM language docs (3.9 version) and I don't
>>>> see any documentation on 'rotl'. What does it do? Why isn't it in the docs?
>>>>
>>>> Phil
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161103/1345ddbe/attachment.html>


More information about the llvm-dev mailing list