[llvm-dev] Buildbots timing out on full builds
Daniel Sanders via llvm-dev
llvm-dev at lists.llvm.org
Thu Jun 1 02:14:07 PDT 2017
> On 31 May 2017, at 22:52, Vitaly Buka <vitalybuka at google.com> wrote:
>
> Is https://reviews.llvm.org/differential/diff/100829/ <https://reviews.llvm.org/differential/diff/100829/> replacement for r303341?
>
> If so LGTM.
It's a few commits from my local git repo squashed into a single diff:
* D33590 and D33596 since I'd otherwise have to rewrite them.
* r303341
* Three patches to convert the generated code to a state machine.
I'm currently finishing off the test updates to the state machine patches and posting them on llvm-commits.
> r303542 msan AArch64InstructionSelector.cpp: 1m17.209s
> r303542+diff/100829/ <https://reviews.llvm.org/differential/diff/100829/> msan AArch64InstructionSelector.cpp: 1m24.724s
That's much better :-). Thanks for trying the patch.
> On Wed, May 31, 2017 at 6:13 AM, Daniel Sanders <daniel_l_sanders at apple.com <mailto:daniel_l_sanders at apple.com>> wrote:
> Great! I expect I'll be able to cut it down further once I start fusing these smaller state-machines together. Before that, I'll re-order the patches that went into that diff so that I don't have to re-commit the regression before fixing it.
>
>> On 31 May 2017, at 13:48, Diana Picus <diana.picus at linaro.org <mailto:diana.picus at linaro.org>> wrote:
>>
>> Hi,
>>
>> This runs in:
>> real 13m6.296s
>> user 42m45.191s
>> sys 1m2.030s
>>
>> (on top of a fully built r303542). It should be fine for the ARM bots.
>>
>> However, you need to 'return std::move(M)' at line 1884.
>>
>> @Vitaly, is it ok for your bots as well?
>>
>> Cheers,
>> Diana
>>
>> On 31 May 2017 at 10:21, Daniel Sanders <daniel_l_sanders at apple.com <mailto:daniel_l_sanders at apple.com>> wrote:
>> Hi Diana and Vitaly,
>>
>> Could you give https://reviews.llvm.org/differential/diff/100829/ <https://reviews.llvm.org/differential/diff/100829/> a try? When measuring the compile of AArch64InstructionSelector.cpp.o with asan enabled and running under instruments's Allocation profiler, my machine reports that the cumulative memory allocations is down to ~3.5GB (was ~10GB), the number of allocations down to ~4 million (was ~23 million), and the compile time down to ~15s (was ~60s).
>>
>> The patch is based on r303542 and the main change is that most of the generated C++ has been replaced with a state-machine based implementation. It's not fully converted to a state-machine yet since it generates lots of smaller machines (one matcher and one emitter per rule) instead of a single machine but it's hopefully sufficient to unblock my patch series.
>>
>>> On 26 May 2017, at 09:10, Diana Picus <diana.picus at linaro.org <mailto:diana.picus at linaro.org>> wrote:
>>>
>>> Ok, that sounds reasonable. I'm happy to test more patches for you
>>> when they're ready.
>>>
>>> On 25 May 2017 at 17:39, Daniel Sanders <daniel_l_sanders at apple.com <mailto:daniel_l_sanders at apple.com>> wrote:
>>>> Thanks for trying that patch. I agree that 34 mins still isn't good enough but we're heading in the right direction.
>>>>
>>>> Changing the partitioning predicate to the instruction opcode rather than the number of operands in the top-level instruction will hopefully cut it down further. I also have a patch that shaves a small amount off of the compile-time by replacing the various LLT::scalar()/LLT::vector() calls with references to LLT objects that were created in advance. I tried something similar with the getRegBankForRegClass() but I haven't written that as a patch yet since that one requires some refactors to get access to a mapping that RegisterBankEmitter.cpp knows. In my experiment I edited this information into AArchGenGlobalISel.inc by hand.
>>>>
>>>> I think the real solution is to convert the generated C++ to the state-machine that we intended to end up with. I don't think we'll be able to put it off much longer given that we're hitting compile-time problems when we can only import 25% of the rules. That said, I have a couple more nearly-finished patches I'd like to get in before we introduce the state machine. Hopefully, the above tricks will be enough to save me a re-write.
>>>>
>>>>> On 25 May 2017, at 16:11, Diana Picus <diana.picus at linaro.org <mailto:diana.picus at linaro.org>> wrote:
>>>>>
>>>>> Hi Daniel,
>>>>>
>>>>> I built r303542, then applied your patch and built again and it still takes
>>>>> real 34m30.279s
>>>>> user 84m36.553s
>>>>> sys 0m58.372s
>>>>>
>>>>> This is better than the 50m I saw before, but I think we should try to
>>>>> make it a bit faster. Do you have any other ideas to make it work?
>>>>>
>>>>> Thanks,
>>>>> Diana
>>>>>
>>>>>
>>>>> On 22 May 2017 at 11:22, Diana Picus <diana.picus at linaro.org <mailto:diana.picus at linaro.org>> wrote:
>>>>>> Hi Daniel,
>>>>>>
>>>>>> I did your experiment on a TK1 machine (same as the bots) and for r303258 I get:
>>>>>> real 18m28.882s
>>>>>> user 35m37.091s
>>>>>> sys 0m44.726s
>>>>>>
>>>>>> and for r303259:
>>>>>> real 50m52.048s
>>>>>> user 88m25.473s
>>>>>> sys 0m46.548s
>>>>>>
>>>>>> If I can help investigate, please let me know, otherwise we can just
>>>>>> try your fixes and see how they affect compilation time.
>>>>>>
>>>>>> Thanks,
>>>>>> Diana
>>>>>>
>>>>>> On 22 May 2017 at 10:49, Daniel Sanders <daniel_l_sanders at apple.com <mailto:daniel_l_sanders at apple.com>> wrote:
>>>>>>> r303341 is the re-commit of the r303259 which tripled the number of rules
>>>>>>> that can be imported into GlobalISel from SelectionDAG. A compile time
>>>>>>> regression is to be expected but when I looked into it I found it was ~25s
>>>>>>> on my machine for the whole incremental build rather than the ~12mins you
>>>>>>> are seeing. I'll take another look.
>>>>>>>
>>>>>>> I'm aware of a couple easy improvements we could make to the way the
>>>>>>> importer works. I was leaving them until we change it over to a state
>>>>>>> machine but the most obvious is to group rules by their top-level gMIR
>>>>>>> instruction. This would reduce the cost of the std::sort that handles the
>>>>>>> rule priorities in generating the source file and will also make it simpler
>>>>>>> for the compiler to compile it.
>>>>>>>
>>>>>>>
>>>>>>> On 21 May 2017, at 11:16, Vitaly Buka <vitalybuka at google.com <mailto:vitalybuka at google.com>> wrote:
>>>>>>>
>>>>>>> It must be r303341, I commented on corresponding llvm-commits thread.
>>>>>>>
>>>>>>> On Fri, May 19, 2017 at 7:34 AM, Diana Picus via llvm-dev
>>>>>>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>>>>>>>
>>>>>>>> Ok, thanks. I'll try to do a bisect next week to see if I can find it.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Diana
>>>>>>>>
>>>>>>>> On 19 May 2017 at 16:29, Daniel Sanders <daniel_l_sanders at apple.com <mailto:daniel_l_sanders at apple.com>>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> On 19 May 2017, at 14:54, Daniel Sanders via llvm-dev
>>>>>>>>>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>>>>>>>>>
>>>>>>>>>> r303259 will have increased compile-time since it tripled the number of
>>>>>>>>>> importable
>>>>>>>>>> SelectionDAG rules but a quick measurement building the affected file:
>>>>>>>>>> ninja
>>>>>>>>>> lib/Target/<Target>/CMakeFiles/LLVM<Target>CodeGen.dir/<Target>InstructionSelector.cpp.o
>>>>>>>>>> for both ARM and AArch64 didn't show a significant increase. I'll check
>>>>>>>>>> whether
>>>>>>>>>> it made a different to linking.
>>>>>>>>>
>>>>>>>>> I don't think it's r303259. Starting with a fully built r303259, then
>>>>>>>>> updating to r303258 and running 'ninja' gives me:
>>>>>>>>> real 2m28.273s
>>>>>>>>> user 13m23.171s
>>>>>>>>> sys 0m47.725s
>>>>>>>>> then updating to r303259 and running 'ninja' again gives me:
>>>>>>>>> real 2m19.052s
>>>>>>>>> user 13m38.802s
>>>>>>>>> sys 0m44.551s
>>>>>>>>>
>>>>>>>>>> sanitizer-x86_64-linux-fast also timed out after one of my commits this
>>>>>>>>>> morning.
>>>>>>>>>>
>>>>>>>>>>> On 19 May 2017, at 14:14, Diana Picus <diana.picus at linaro.org <mailto:diana.picus at linaro.org>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> We've noticed that recently some of our bots (mostly
>>>>>>>>>>> clang-cmake-armv7-a15 and clang-cmake-thumbv7-a15) started timing out
>>>>>>>>>>> whenever someone commits a change to TableGen:
>>>>>>>>>>>
>>>>>>>>>>> r303418:
>>>>>>>>>>> http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/7268 <http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/7268>
>>>>>>>>>>> r303346:
>>>>>>>>>>> http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/7242 <http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/7242>
>>>>>>>>>>> r303341:
>>>>>>>>>>> http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/7239 <http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/7239>
>>>>>>>>>>> r303259:
>>>>>>>>>>> http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/7198 <http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/7198>
>>>>>>>>>>>
>>>>>>>>>>> TableGen changes before that (I checked about 3-4 of them) don't have
>>>>>>>>>>> this problem:
>>>>>>>>>>> r303253:
>>>>>>>>>>> http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/7197 <http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/7197>
>>>>>>>>>>>
>>>>>>>>>>> That one in particular actually finishes the whole build in 635s,
>>>>>>>>>>> which is only a bit over 50% of the timeout limit (1200s). So, between
>>>>>>>>>>> r303253 and now, something happened that made full builds
>>>>>>>>>>> significantly slower. Does anyone have any idea what that might have
>>>>>>>>>>> been? Also, has anyone noticed this on other bots?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Diana
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> LLVM Developers mailing list
>>>>>>>>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> LLVM Developers mailing list
>>>>>>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170601/7c463de8/attachment.html>
More information about the llvm-dev
mailing list