[PATCH] [AArch64] Enable partial unrolling and runtime unrolling for AArch64 target

Thu Sep 25 03:32:46 PDT 2014

Hi Eric,

Attachments show the experiment results I did on A57. I tried 10, 20 ,30
,40 and 50 as loop buffer size and applied my runtime prologue
simplification patch(http://reviews.llvm.org/D5147). Then I collect
execution time, geomean code bloat and max code bloat at difference loop
buffer size.

So considering getting more performance improvement with an acceptable code
bloat, setting loop buffer size as 20 is a good choice. Maybe it’s a little
conservative, but there’s not much performance improvement from increasing
it to 30 or 40, and have to fix a huge code bloat at a larger loop buffer
size which is not acceptable at -O2 and below optimizaiton level.

Thanks,

Kevin

2014-09-25 9:05 GMT+01:00 James Molloy <james at jamesmolloy.co.uk>:

> Hi Eric,
>
> Thanks for sharing the link. I'm afraid I can't comment on
> microarchitectural details, but I don't think 32 would be correct for A57.
>
> Kevin has much data and many graphs, hopefully he'll be sharing them soon!
> :)
>
> Cheers,
>
> James
>
> On 24 September 2014 22:17, Eric Christopher <echristo at gmail.com> wrote:
>
>> On Mon, Sep 22, 2014 at 10:10 AM, Eric Christopher <echristo at gmail.com>
>> wrote:
>> > Looks like the chip itself has a 32 entry loop buffer (with 2 forward
>> and one backward branch support). What range of values did you check here?
>> (i.e. why is 32 not the right value to put here like with the other port
>> specific constants?)
>> >
>>
>> FYI just wanted to mention where I got this information which may not
>> be correct - it seems to have it down as a clone of the A15 and isn't
>> representative of the actual silicon here.
>>
>> http://pc.watch.impress.co.jp/video/pcw/docs/614/543/08p.pdf
>>
>> for the first bit, which looks like it's based on this article:
>>
>> http://pc.watch.impress.co.jp/docs/column/kaigai/20121031_569691.html
>>
>> If we're just going to go with a heuristics based approach, it might
>> be nice to show some samples/graphs and definitely comment that in the
>> code with how it was determined.
>>
>> -eric
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>
>

-- 
Best Regards,

Kevin Qin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140925/6fd704b8/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1_ExecutionTime_LoopBufferSize.png
Type: image/png
Size: 16904 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140925/6fd704b8/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 2_ExecutionTime_GeomeanCodeBloat.png
Type: image/png
Size: 18130 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140925/6fd704b8/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 3_ExecutionTime_MaxCodeBloat.png
Type: image/png
Size: 15779 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140925/6fd704b8/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 4_PerformanceImprovementPerCodeSizeBloat.png
Type: image/png
Size: 13093 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140925/6fd704b8/attachment-0003.png>