[llvm-dev] rotl: undocumented LLVM instruction?

Sat Nov 5 15:27:32 PDT 2016

What is the advantage of preventing combining the rotl instead of teach the selection to match this extended pattern that includes the rotl and generates the bclr code?

— 
Mehdi

> On Nov 3, 2016, at 4:41 PM, Krzysztof Parzyszek via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> One option may be to prevent the formation of ROTL, if possible, and then generating rol by hand.
> Marking it as "expand" would likely stop the DAG combiner from creating it. Then you could "preprocess" the selection DAG before the instruction selection and do the pattern matching yourself.
> 
> -Krzysztof
> 
> 
> On 11/3/2016 4:24 PM, Phil Tomson via llvm-dev wrote:
>> I could try setting ISD::ROTL to Expand... however, we do have a rol op
>> and we'd like the ISD::ROTL to map to it.  If I set it to Expand it's
>> not going to do that, right?
>> 
>> I think in this case we're just getting the ISD::ROTL a bit too soon in
>> the process and that's causing us to miss other optimization
>> opportunities later on.
>> 
>> Phil
>> 
>> On Thu, Nov 3, 2016 at 2:20 PM, Ryan Taylor <ryta1203 at gmail.com <mailto:ryta1203 at gmail.com>
>> <mailto:ryta1203 at gmail.com <mailto:ryta1203 at gmail.com>>> wrote:
>> 
>>    Setting the ISD::ROTL to Expand doesn't work? (via SetOperation)
>> 
>>    You could also do a Custom hook if that's what you're looking for.
>> 
>>    On Thu, Nov 3, 2016 at 5:12 PM, Phil Tomson <phil.a.tomson at gmail.com <mailto:phil.a.tomson at gmail.com>
>>    <mailto:phil.a.tomson at gmail.com <mailto:phil.a.tomson at gmail.com>>> wrote:
>> 
>>        ... or perhaps to rephrase:
>> 
>>        In 3.9 it seems to be doing a smaller combine much sooner,
>>        whereas in 3.6 it deferred that till later in the instruction
>>        selection pattern matching - the latter was giving us better
>>        results because it seems to match a larger pattern than the
>>        former did in the earlier stage.
>> 
>>        Phil
>> 
>>        On Thu, Nov 3, 2016 at 2:07 PM, Phil Tomson
>>        <phil.a.tomson at gmail.com <mailto:phil.a.tomson at gmail.com> <mailto:phil.a.tomson at gmail.com <mailto:phil.a.tomson at gmail.com>>> wrote:
>> 
>>            Is there any way to get it to delay this optimization where
>>            it goes from this:
>> 
>>            Initial selection DAG: BB#0 'bclr64:entry'
>>            SelectionDAG has 14 nodes:
>>              t0: ch = EntryToken
>>                  t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
>>                        t4: i64,ch = CopyFromReg t0, Register:i64 %vreg1
>>                      t6: i64 = sub t4, Constant:i64<1>
>>                    t7: i64 = shl Constant:i64<1>, t6
>>                  t9: i64 = xor t7, Constant:i64<-1>
>>                t10: i64 = and t2, t9
>>              t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
>>              t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>> 
>> 
>> 
>>            Combining: t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>> 
>>            Combining: t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
>> 
>>            Combining: t11: i64 = Register %R1
>> 
>>            Combining: t10: i64 = and t2, t9
>> 
>>            Combining: t9: i64 = xor t7, Constant:i64<-1>
>>             ... into: t15: i64 = rotl Constant:i64<-2>, t6
>> 
>>            ...to this:
>> 
>>            Optimized lowered selection DAG: BB#0 'bclr64:entry'
>>            SelectionDAG has 13 nodes:
>>              t0: ch = EntryToken
>>                  t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
>>                      t4: i64,ch = CopyFromReg t0, Register:i64 %vreg1
>>                    t17: i64 = add t4, Constant:i64<-1>
>>                  t15: i64 = rotl Constant:i64<-2>, t17
>>                t10: i64 = and t2, t15
>>              t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
>>              t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>> 
>> 
>>            That combining of the xor & and there ends up giving us
>>            suboptimal results as compared with 3.6.
>> 
>>            For example, in 3.6 the generated code is simply:
>> 
>>            bclr64:                                 # @bclr64
>>            # BB#0:                                 # %entry
>>                addI    r1, r1, -1, 64
>>                bclr        r1, r0, r1, 64
>>                jabs        r511
>> 
>>            Whereas with 3.9 the generated code is:
>> 
>>            bclr64:                                 # @bclr64
>>            # BB#0:                                 # %entry
>>                addI    r1, r1, -1, 64
>>                movimm        r2, -2, 64
>>                rol        r1, r2, r1, 64
>>                bitop1        r1, r0, r1, AND, 64
>>                jabs        r511
>> 
>> 
>>            ... it seems to be negatively impacting some of our larger
>>            benchmarks as well that used to contains several bclr (bit
>>            clear) commands but now contain much less.
>> 
>>            Phil
>> 
>> 
>> 
>> 
>>            On Wed, Nov 2, 2016 at 4:10 PM, Ryan Taylor
>>            <ryta1203 at gmail.com <mailto:ryta1203 at gmail.com> <mailto:ryta1203 at gmail.com <mailto:ryta1203 at gmail.com>>> wrote:
>> 
>>                I believe some of the ISDs were introduced to allow for
>>                DAG optimizations under the assumption that some of the
>>                major architectures directly support these types of
>>                instructions.
>> 
>>                -Ryan
>> 
>>                On Wed, Nov 2, 2016 at 6:24 PM, Phil Tomson via llvm-dev
>>                <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>                <mailto:llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>> wrote:
>> 
>>                    We've recently moved our project from LLVM 3.6 to
>>                    LLVM 3.9.  I noticed  one of our code generation
>>                    tests is breaking in 3.9.
>> 
>>                    The test is:
>> 
>>                     ; RUN: llc < %s -march=xstg | FileCheck %s
>> 
>>                    define i64 @bclr64(i64 %a, i64 %b) nounwind readnone {
>>                    entry:
>>                    ; CHECK: bclr     r1, r0, r1, 64
>>                      %sub = sub i64 %b, 1
>>                      %shl = shl i64 1, %sub
>>                      %xor = xor i64 %shl, -1
>>                      %and = and i64 %a, %xor
>>                      ret i64 %and
>>                    }
>> 
>>                    I ran llc with -debug to get a better idea of what's
>>                    going on and found:
>> 
>>                    Initial selection DAG: BB#0 'bclr64:entry'
>>                    SelectionDAG has 14 nodes:
>>                      t0: ch = EntryToken
>>                          t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
>>                                t4: i64,ch = CopyFromReg t0,
>>                    Register:i64 %vreg1
>>                              t6: i64 = sub t4, Constant:i64<1>
>>                            t7: i64 = shl Constant:i64<1>, t6
>>                          t9: i64 = xor t7, Constant:i64<-1>
>>                        t10: i64 = and t2, t9
>>                      t12: ch,glue = CopyToReg t0, Register:i64 %R1, t10
>>                      t13: ch = XSTGISD::Ret t12, Register:i64 %R1, t12:1
>> 
>> 
>> 
>>                    Combining: t13: ch = XSTGISD::Ret t12, Register:i64
>>                    %R1, t12:1
>> 
>>                    Combining: t12: ch,glue = CopyToReg t0, Register:i64
>>                    %R1, t10
>> 
>>                    Combining: t11: i64 = Register %R1
>> 
>>                    Combining: t10: i64 = and t2, t9
>> 
>>                    Combining: t9: i64 = xor t7, Constant:i64<-1>
>>                     ... into: t15: i64 = rotl Constant:i64<-2>, t6
>> 
>>                    Combining: t10: i64 = and t2, t15
>> 
>>                    Combining: t15: i64 = rotl Constant:i64<-2>, t6
>> 
>>                    Combining: t14: i64 = Constant<-2>
>> 
>>                    Combining: t6: i64 = sub t4, Constant:i64<1>
>>                     ... into: t17: i64 = add t4, Constant:i64<-1>
>> 
>>                    Combining: t15: i64 = rotl Constant:i64<-2>, t17
>> 
>> 
>> 
>>                    These rotl instructions weren't showing up when I
>>                    ran llc 3.6 and that's completely changing the
>>                    generated code at the end which means the test fails
>>                    (and it's less optimal than it was in 3.6).
>> 
>>                    I've been looking in the LLVM language docs (3.9
>>                    version) and I don't see any documentation on
>>                    'rotl'. What does it do? Why isn't it in the docs?
>> 
>>                    Phil
>> 
>>                    _______________________________________________
>>                    LLVM Developers mailing list
>>                    llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> <mailto:llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
>>                    http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>                    <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>>
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161105/21dae1b4/attachment.html>