Disable arm-gepopt by default

Tue Apr 21 11:38:42 PDT 2015

Hi Michael,

Thanks. I'll take a look tomorrow. At least this is nothing to do with the
pass itself- if re association regresses instruction selection, something
is fragile!

Will you have time to look at this one today? It's now 19:30 here and I'm
away for the evening.

Cheers,

James
On Tue, 21 Apr 2015 at 19:36, Michael Zolotukhin <mzolotukhin at apple.com>
wrote:

> Hi James,
>
> It looks like the first patch gives nearly the same results as both
> patches together (at least, in terms of gains and regressions on
> geekbench). E.g. the code generated for mandelbrot is the same.
>
> Thanks,
> Michael
>
> On Apr 21, 2015, at 11:29 AM, James Molloy <james at jamesmolloy.co.uk>
> wrote:
>
> Hi Michael,
>
> Jingue pointed out that my second patch did pessimize some code, so do you
> still see the regressions with my first patch (the reassoc one)?
>
> Cheers,
>
> James
> On Tue, 21 Apr 2015 at 19:26, Michael Zolotukhin <mzolotukhin at apple.com>
> wrote:
>
>> Hi James,
>>
>> Thank you for taking your time to work on it! I tried the patches you
>> posted and they did fix the problem on twofish. However, with them there
>> are other regressions - and while you are at it, could you please take a
>> look at the generated code? Probably, you can come up with another one-line
>> fix.
>>
>> So, one regression is on Geekbench/Mandlerbrot, here is the ‘fast’ and
>> ‘slow’ disassembly files (fast - from reference compiler, slow - reference
>> compiler+D9148+D9149).
>>
>>
>> I can see that we generate less ‘madd’ instructions, but in some cases we
>> end up with less-optimal sequences, including ‘mul', ‘add' and ‘or'
>> instead. Also, in some cases we now generate 2 ‘str' instead of 1 ‘stp'.
>>
>> Do you have any ideas what might go wrong this time?
>>
>> Thanks,
>> Michael
>>
>> On Apr 21, 2015, at 6:55 AM, James Molloy <james at jamesmolloy.co.uk>
>> wrote:
>>
>> See:
>>
>> http://reviews.llvm.org/D9148
>> http://reviews.llvm.org/D9149
>>
>> Cheers,
>>
>> James
>>
>> On Tue, 21 Apr 2015 at 12:06 James Molloy <james at jamesmolloy.co.uk>
>> wrote:
>>
>>> Hi Michael, Gerolf,
>>>
>>> The main goal of this optimization was to reduce the number of
>>> multiply-adds emitted when complex GEPs are used. With regards to SPEC,
>>> this is particularly prevalent in sjeng and gobmk (although we don't just
>>> care about SPEC and there were plenty of other places we caught this idiom).
>>>
>>> The pass is actively reducing the number of multiply-adds in gobmk:
>>>
>>> $ llc engine.ll -O3 -mllvm -aarch64-gep-opt=false -o - |& grep madd | wc
>>> -l
>>>  297
>>>
>>> $ llc engine.ll -O3 -mllvm -aarch64-gep-opt=true -o - |& grep madd | wc
>>> -l
>>>  158
>>>
>>> So it demonstrably is making at least some code better.
>>>
>>> I was able to reproduce your twofish example. Twofish, like most crypto
>>> algorithms, has a pattern repeated many times so the problem could just be
>>> down to one pattern producing one more instruction, which it turns out is
>>> exactly the case.
>>>
>>> The problem is that with gep-opt enabled, we're associating shifts and
>>> constants the other way around.
>>>
>>> With gep-opt disabled (synthetic code example):
>>>    add x0, x0, #72
>>>    ldr x0, [x1, x0, lsl #2]
>>>
>>> With gep-opt enabled:
>>>    add x0, x1, x0, lsl #2
>>>    ldr x0, [x0, #72]
>>>
>>> This is actually fine, until another expression reuses a different shift
>>> with the same offset:
>>>
>>> With gep-opt disabled:
>>>    add x0, x0, #72
>>>    ldr x0, [x1, x0, lsl #2]
>>>    ldr x2, [x1, x0, lsl #3]
>>>
>>> With gep-opt enabled:
>>>    add x0, x1, x0, lsl #2
>>>    add x2, x1, x0, lsl #3
>>>    ldr x0, [x0, #72]
>>>    ldr x2, [x0, #72]
>>>
>>> Amusingly, I saw the opposite behaviour on AArch32 on a different
>>> testcase and came to the conclusion that expressions with correlated
>>> immediate offsets and identical shift amounts were more likely than
>>> expressions with correlated shift amounts and identical immediate offsets,
>>> but that's a different story.
>>>
>>> Anyway, the reason this is happening is because we're not running
>>> reassociate after GEP expansion. If we run reassociate it all goes away,
>>> it's a one-line fix.
>>>
>>> The two testcases that were disabled - well, they weren't really
>>> disabled, but the behaviour was changed for them. I put that down to a bug
>>> in the SeparateConstOffsetFromGEP pass, where it tries to separate a GEP
>>> with only one variadic offset:
>>>
>>>   GEP i8* %base, i32 %offset
>>>
>>> Nothing beneficial can come of that, so I've got, again, a single line
>>> patch to fix this. I'll push both patches upstream just as soon as I verify
>>> them.
>>>
>>> So in summary, I don't see any need to disable this pass. It's doing
>>> good work, it just has one or two bugs.
>>>
>>> Cheers,
>>>
>>> James
>>>
>>> On Tue, 21 Apr 2015 at 05:27 Michael Zolotukhin <mzolotukhin at apple.com>
>>> wrote:
>>>
>>>> Hi Tim and others,
>>>>
>>>> So, the entire discussion is moving around the assumption that this
>>>> pass is beneficial in some cases, but we don’t have such data. Please note
>>>> that I don’t want to say that the assumption is wrong, but as far as I
>>>> understand, only Hao observed some gains. Thus, until we hear from him, we
>>>> have real slow downs versus presumable gains.
>>>>
>>>> As for the looking into the code later - I’m not an expert in this
>>>> area, and I don’t even have a motivational example for this optimization,
>>>> so I’d barely be a right person for it. I’d be glad to assist with
>>>> collecting some additional data/running tests if that might help though.
>>>>
>>>> Thanks,
>>>> Michael
>>>>
>>>>
>>>> > On Apr 20, 2015, at 8:40 PM, Tim Northover <t.p.northover at gmail.com>
>>>> wrote:
>>>> >
>>>> > On 20 April 2015 at 20:34, Gerolf Hoflehner <ghoflehner at apple.com>
>>>> wrote:
>>>> >> It shouldn’t come across as ‘extreme’.
>>>> >
>>>> > The "extreme" part wasn't so much the tentative disabling as "Could
>>>> > you and/or Hao take this on after the code is disabled/backed out?". I
>>>> > realise it wasn't an ultimatum, but it could very easily turn into
>>>> > "we're washing our hands of this", which I'd object to.
>>>> >
>>>> > I assume the pass was committed with some kind of metric that it
>>>> > improved, and now we have a counter-example. It's down to everyone to
>>>> > decide what the best way to proceed is.
>>>> >
>>>> > Cheers.
>>>> >
>>>> > Tim.
>>>> >
>>>> > _______________________________________________
>>>> > llvm-commits mailing list
>>>> > llvm-commits at cs.uiuc.edu
>>>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>
>>>>
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150421/11ab9835/attachment.html>