[llvm-dev] rotl: undocumented LLVM instruction?

Thu Nov 3 14:27:18 PDT 2016

Change the DAGCombine.

On Nov 3, 2016 17:24, "Phil Tomson" <phil.a.tomson at gmail.com> wrote:

> I could try setting ISD::ROTL to Expand... however, we do have a rol op
> and we'd like the ISD::ROTL to map to it.  If I set it to Expand it's not
> going to do that, right?
>
> I think in this case we're just getting the ISD::ROTL a bit too soon in
> the process and that's causing us to miss other optimization opportunities
> later on.
>
> Phil
>
> On Thu, Nov 3, 2016 at 2:20 PM, Ryan Taylor <ryta1203 at gmail.com> wrote:
>
>> Setting the ISD::ROTL to Expand doesn't work? (via SetOperation)
>>
>> You could also do a Custom hook if that's what you're looking for.
>>
>> On Thu, Nov 3, 2016 at 5:12 PM, Phil Tomson <phil.a.tomson at gmail.com>
>> wrote:
>>
>>> ... or perhaps to rephrase:
>>>
>>> In 3.9 it seems to be doing a smaller combine much sooner, whereas in
>>> 3.6 it deferred that till later in the instruction selection pattern
>>> matching - the latter was giving us better results because it seems to
>>> match a larger pattern than the former did in the earlier stage.
>>>
>>> Phil
>>>
>>> On Thu, Nov 3, 2016 at 2:07 PM, Phil Tomson <phil.a.tomson at gmail.com>
>>> wrote:
>>>
>>>> Is there any way to get it to delay this optimization where it goes
>>>> from this:
>>>>
>>>> Initial selection DAG: BB#0 'bclr64:entry'
>>>> SelectionDAG has 14 nodes:
>>>>   t0: ch = EntryToken
>>>>       t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
>>>>             t4: i64,ch = CopyFromReg t0, Register:i64 %vreg1
>>>>           t6: i64 = sub t4, Constant:i64<1>
>>>>         t7: i64 = shl Constant:i64<1>, t6
>>>>       t9: i64 = xor t7, Constant:i64<-1>
>>>>     t10: i64 = and t2, t9
>>>>   t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
>>>>   t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>>>>
>>>>
>>>>
>>>> Combining: t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>>>>
>>>> Combining: t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
>>>>
>>>> Combining: t11: i64 = Register %R1
>>>>
>>>> Combining: t10: i64 = and t2, t9
>>>>
>>>> Combining: t9: i64 = xor t7, Constant:i64<-1>
>>>>  ... into: t15: i64 = rotl Constant:i64<-2>, t6
>>>>
>>>> ...to this:
>>>>
>>>> Optimized lowered selection DAG: BB#0 'bclr64:entry'
>>>> SelectionDAG has 13 nodes:
>>>>   t0: ch = EntryToken
>>>>       t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
>>>>           t4: i64,ch = CopyFromReg t0, Register:i64 %vreg1
>>>>         t17: i64 = add t4, Constant:i64<-1>
>>>>       t15: i64 = rotl Constant:i64<-2>, t17
>>>>     t10: i64 = and t2, t15
>>>>   t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
>>>>   t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>>>>
>>>>
>>>> That combining of the xor & and there ends up giving us suboptimal
>>>> results as compared with 3.6.
>>>>
>>>> For example, in 3.6 the generated code is simply:
>>>>
>>>> bclr64:                                 # @bclr64
>>>> # BB#0:                                 # %entry
>>>>     addI    r1, r1, -1, 64
>>>>     bclr        r1, r0, r1, 64
>>>>     jabs        r511
>>>>
>>>> Whereas with 3.9 the generated code is:
>>>>
>>>> bclr64:                                 # @bclr64
>>>> # BB#0:                                 # %entry
>>>>     addI    r1, r1, -1, 64
>>>>     movimm        r2, -2, 64
>>>>     rol        r1, r2, r1, 64
>>>>     bitop1        r1, r0, r1, AND, 64
>>>>     jabs        r511
>>>>
>>>>
>>>> ... it seems to be negatively impacting some of our larger benchmarks
>>>> as well that used to contains several bclr (bit clear) commands but now
>>>> contain much less.
>>>>
>>>> Phil
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Nov 2, 2016 at 4:10 PM, Ryan Taylor <ryta1203 at gmail.com> wrote:
>>>>
>>>>> I believe some of the ISDs were introduced to allow for DAG
>>>>> optimizations under the assumption that some of the major architectures
>>>>> directly support these types of instructions.
>>>>>
>>>>> -Ryan
>>>>>
>>>>> On Wed, Nov 2, 2016 at 6:24 PM, Phil Tomson via llvm-dev <
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>>> We've recently moved our project from LLVM 3.6 to LLVM 3.9.  I
>>>>>> noticed  one of our code generation tests is breaking in 3.9.
>>>>>>
>>>>>> The test is:
>>>>>>
>>>>>>  ; RUN: llc < %s -march=xstg | FileCheck %s
>>>>>>
>>>>>> define i64 @bclr64(i64 %a, i64 %b) nounwind readnone {
>>>>>> entry:
>>>>>> ; CHECK: bclr     r1, r0, r1, 64
>>>>>>   %sub = sub i64 %b, 1
>>>>>>   %shl = shl i64 1, %sub
>>>>>>   %xor = xor i64 %shl, -1
>>>>>>   %and = and i64 %a, %xor
>>>>>>   ret i64 %and
>>>>>> }
>>>>>>
>>>>>> I ran llc with -debug to get a better idea of what's going on and
>>>>>> found:
>>>>>>
>>>>>> Initial selection DAG: BB#0 'bclr64:entry'
>>>>>> SelectionDAG has 14 nodes:
>>>>>>   t0: ch = EntryToken
>>>>>>       t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
>>>>>>             t4: i64,ch = CopyFromReg t0, Register:i64 %vreg1
>>>>>>           t6: i64 = sub t4, Constant:i64<1>
>>>>>>         t7: i64 = shl Constant:i64<1>, t6
>>>>>>       t9: i64 = xor t7, Constant:i64<-1>
>>>>>>     t10: i64 = and t2, t9
>>>>>>   t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
>>>>>>   t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>>>>>>
>>>>>>
>>>>>>
>>>>>> Combining: t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>>>>>>
>>>>>> Combining: t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
>>>>>>
>>>>>> Combining: t11: i64 = Register %R1
>>>>>>
>>>>>> Combining: t10: i64 = and t2, t9
>>>>>>
>>>>>> Combining: t9: i64 = xor t7, Constant:i64<-1>
>>>>>>  ... into: t15: i64 = rotl Constant:i64<-2>, t6
>>>>>>
>>>>>> Combining: t10: i64 = and t2, t15
>>>>>>
>>>>>> Combining: t15: i64 = rotl Constant:i64<-2>, t6
>>>>>>
>>>>>> Combining: t14: i64 = Constant<-2>
>>>>>>
>>>>>> Combining: t6: i64 = sub t4, Constant:i64<1>
>>>>>>  ... into: t17: i64 = add t4, Constant:i64<-1>
>>>>>>
>>>>>> Combining: t15: i64 = rotl Constant:i64<-2>, t17
>>>>>>
>>>>>>
>>>>>>
>>>>>> These rotl instructions weren't showing up when I ran llc 3.6 and
>>>>>> that's completely changing the generated code at the end which means the
>>>>>> test fails (and it's less optimal than it was in 3.6).
>>>>>>
>>>>>> I've been looking in the LLVM language docs (3.9 version) and I don't
>>>>>> see any documentation on 'rotl'. What does it do? Why isn't it in the docs?
>>>>>>
>>>>>> Phil
>>>>>>
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> llvm-dev at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161103/d0e8abd2/attachment.html>