[llvm-dev] rotl: undocumented LLVM instruction?
Phil Tomson via llvm-dev
llvm-dev at lists.llvm.org
Thu Nov 3 14:12:10 PDT 2016
... or perhaps to rephrase:
In 3.9 it seems to be doing a smaller combine much sooner, whereas in 3.6
it deferred that till later in the instruction selection pattern matching -
the latter was giving us better results because it seems to match a larger
pattern than the former did in the earlier stage.
Phil
On Thu, Nov 3, 2016 at 2:07 PM, Phil Tomson <phil.a.tomson at gmail.com> wrote:
> Is there any way to get it to delay this optimization where it goes from
> this:
>
> Initial selection DAG: BB#0 'bclr64:entry'
> SelectionDAG has 14 nodes:
> t0: ch = EntryToken
> t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
> t4: i64,ch = CopyFromReg t0, Register:i64 %vreg1
> t6: i64 = sub t4, Constant:i64<1>
> t7: i64 = shl Constant:i64<1>, t6
> t9: i64 = xor t7, Constant:i64<-1>
> t10: i64 = and t2, t9
> t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
> t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>
>
>
> Combining: t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>
> Combining: t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
>
> Combining: t11: i64 = Register %R1
>
> Combining: t10: i64 = and t2, t9
>
> Combining: t9: i64 = xor t7, Constant:i64<-1>
> ... into: t15: i64 = rotl Constant:i64<-2>, t6
>
> ...to this:
>
> Optimized lowered selection DAG: BB#0 'bclr64:entry'
> SelectionDAG has 13 nodes:
> t0: ch = EntryToken
> t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
> t4: i64,ch = CopyFromReg t0, Register:i64 %vreg1
> t17: i64 = add t4, Constant:i64<-1>
> t15: i64 = rotl Constant:i64<-2>, t17
> t10: i64 = and t2, t15
> t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
> t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>
>
> That combining of the xor & and there ends up giving us suboptimal results
> as compared with 3.6.
>
> For example, in 3.6 the generated code is simply:
>
> bclr64: # @bclr64
> # BB#0: # %entry
> addI r1, r1, -1, 64
> bclr r1, r0, r1, 64
> jabs r511
>
> Whereas with 3.9 the generated code is:
>
> bclr64: # @bclr64
> # BB#0: # %entry
> addI r1, r1, -1, 64
> movimm r2, -2, 64
> rol r1, r2, r1, 64
> bitop1 r1, r0, r1, AND, 64
> jabs r511
>
>
> ... it seems to be negatively impacting some of our larger benchmarks as
> well that used to contains several bclr (bit clear) commands but now
> contain much less.
>
> Phil
>
>
>
>
> On Wed, Nov 2, 2016 at 4:10 PM, Ryan Taylor <ryta1203 at gmail.com> wrote:
>
>> I believe some of the ISDs were introduced to allow for DAG optimizations
>> under the assumption that some of the major architectures directly support
>> these types of instructions.
>>
>> -Ryan
>>
>> On Wed, Nov 2, 2016 at 6:24 PM, Phil Tomson via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> We've recently moved our project from LLVM 3.6 to LLVM 3.9. I noticed
>>> one of our code generation tests is breaking in 3.9.
>>>
>>> The test is:
>>>
>>> ; RUN: llc < %s -march=xstg | FileCheck %s
>>>
>>> define i64 @bclr64(i64 %a, i64 %b) nounwind readnone {
>>> entry:
>>> ; CHECK: bclr r1, r0, r1, 64
>>> %sub = sub i64 %b, 1
>>> %shl = shl i64 1, %sub
>>> %xor = xor i64 %shl, -1
>>> %and = and i64 %a, %xor
>>> ret i64 %and
>>> }
>>>
>>> I ran llc with -debug to get a better idea of what's going on and found:
>>>
>>> Initial selection DAG: BB#0 'bclr64:entry'
>>> SelectionDAG has 14 nodes:
>>> t0: ch = EntryToken
>>> t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
>>> t4: i64,ch = CopyFromReg t0, Register:i64 %vreg1
>>> t6: i64 = sub t4, Constant:i64<1>
>>> t7: i64 = shl Constant:i64<1>, t6
>>> t9: i64 = xor t7, Constant:i64<-1>
>>> t10: i64 = and t2, t9
>>> t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
>>> t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>>>
>>>
>>>
>>> Combining: t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>>>
>>> Combining: t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
>>>
>>> Combining: t11: i64 = Register %R1
>>>
>>> Combining: t10: i64 = and t2, t9
>>>
>>> Combining: t9: i64 = xor t7, Constant:i64<-1>
>>> ... into: t15: i64 = rotl Constant:i64<-2>, t6
>>>
>>> Combining: t10: i64 = and t2, t15
>>>
>>> Combining: t15: i64 = rotl Constant:i64<-2>, t6
>>>
>>> Combining: t14: i64 = Constant<-2>
>>>
>>> Combining: t6: i64 = sub t4, Constant:i64<1>
>>> ... into: t17: i64 = add t4, Constant:i64<-1>
>>>
>>> Combining: t15: i64 = rotl Constant:i64<-2>, t17
>>>
>>>
>>>
>>> These rotl instructions weren't showing up when I ran llc 3.6 and that's
>>> completely changing the generated code at the end which means the test
>>> fails (and it's less optimal than it was in 3.6).
>>>
>>> I've been looking in the LLVM language docs (3.9 version) and I don't
>>> see any documentation on 'rotl'. What does it do? Why isn't it in the docs?
>>>
>>> Phil
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161103/1eeb7ad2/attachment.html>
More information about the llvm-dev
mailing list