[PATCH] PR18825 - New RegisterCoalescer subtarget hook

Eric Christopher echristo at gmail.com
Tue Jul 15 19:42:33 PDT 2014


On Tue, Jul 15, 2014 at 6:55 PM, Chris Bieneman <cbieneman at apple.com> wrote:
>
>> On Jul 15, 2014, at 4:16 PM, Eric Christopher <echristo at gmail.com> wrote:
>>
>> Couple of small comments:
>>
>> +  unsigned SizeMultiplier = MBB->size()/100;
>>
>> Mmm.. random numbers. Why 100? Why not 101? How about a comment :)
>
> I can add a comment. The basic idea is that the heuristic restricts how many registers which meet certain criteria (large, sub-register indexed, …) get coalesced in a given basic block. It seems logical that the larger the basic block the more coalescing you should allow. Most of the tests in the llvm-test suite don’t hit this heuristic at all because they just don’t utilize many enough large registers. Unfortunately that means this really is a bit of a magical number.

Yeah, it's nice to have it be some sort of documented constant rather
than ... well, an undocumented constant. How'd you come up with the
number btw?

>
>>
>> Otherwise... on the subtarget. Why there? Why not on the
>> TargetRegisterInfo? Seems to make just as much sense for this use and
>> TargetRegisterInfo already has a subtarget if you wanted to make cpu
>> specific queries there.
>>
>> Thoughts?
>
> I hadn’t really considered putting it on the TargetRegisterInfo. When Quentin and I discussed the patch off-list we discussed it as a subtarget hook, so that was how I implemented it. In retrospect I still like it in the subtarget because the heuristic feels to me like it is a property of the subtarget, but it could be implemented in the TargetRegisterInfo.
>

Conveniently TargetRegisterInfo is owned by each subtarget so it's
like a subtarget hook :)

I just figured since this seemed to be a register oriented type of
hook that it might make sense to go there. It's one of those things
that we haven't really done in the past, but that seemed to be because
of the confusion with what a Subtarget was versus a Target.

> Quentin, do you have any thoughts on this?
>

Jim generally has good insight here too. Jim?

-eric

> -Chris
>
>>
>> -eric
>>
>>
>>
>> On Tue, Jul 15, 2014 at 7:31 AM, Quentin Colombet <qcolombet at apple.com> wrote:
>>> Hi Chris,
>>>
>>> Thanks for the update, I like what you did with the MachineFunctionInfo!
>>>
>>> LGTM with two comments:
>>> - AFAICT, we do not put anymore the date in filename for test cases.
>>> - For the out-of-registers.ll test case, please add some CHECK lines, just to be sure we are generating something correct.
>>>
>>> Thanks,
>>> -Quentin
>>>
>>>> On Jul 14, 2014, at 10:26 PM, Chris Bieneman <cbieneman at apple.com> wrote:
>>>>
>>>> Quentin,
>>>>
>>>> Here are updated diffs with the changes you suggested. I’ve also included new test cases, one to test vector spilling, and one to test PR18825.
>>>>
>>>> If there are no other concerns about this patch I will commit it tomorrow.
>>>>
>>>> Thank you Quentin and David for the feedback.
>>>>
>>>> -Chris
>>>> <18825.diff>
>>>>> On Jul 14, 2014, at 5:00 PM, Chris Bieneman <cbieneman at apple.com> wrote:
>>>>>
>>>>> Thanks for the review Quentin. It shouldn't be hard to move the DenseMap into the MachineFunctionInfo, I'll rework the patch, and I will provide test cases for both of the cases you mentioned.
>>>>>
>>>>> To close the loop on David's earlier question about the llvm performance suite. This patch has no impact on the generated code in any of our tests, so it will not alter performance.
>>>>>
>>>>> Thanks,
>>>>> -Chris
>>>>>
>>>>> On Jul 14, 2014, at 3:56 PM, Quentin Colombet <qcolombet at apple.com> wrote:
>>>>>
>>>>>> Hi Chris,
>>>>>>
>>>>>> The current patch LGTM, though the DenseMap in the Subtarget is not pretty.
>>>>>> My main concern is with big module, where the map will keep a lot of useless information since we have no way to know when it is fine to clear it.
>>>>>>
>>>>>> I was thinking moving that map into the XXXMachineFunctionInfo, not sure how feasible is that.
>>>>>>
>>>>>> Regarding the final patch, assuming no further feedbacks come, I would expect to have a new test case for PR18825 and the update test for CodeGen/ARM/vldm-sched-a9.ll.
>>>>>>
>>>>>> Thanks,
>>>>>> -Quentin
>>>>>>
>>>>>>> On Jul 14, 2014, at 1:23 PM, Chris Bieneman <cbieneman at apple.com> wrote:
>>>>>>>
>>>>>>> This morning I built SPEC with and without my patch and found no codegen differences.
>>>>>>>
>>>>>>> -Chris
>>>>>>>
>>>>>>> On Jul 11, 2014, at 5:19 PM, Chris Bieneman <cbieneman at apple.com> wrote:
>>>>>>>
>>>>>>>> David,
>>>>>>>>
>>>>>>>> I agree completely. I'll look into any codegen differences in our test suite, and in SPEC.
>>>>>>>>
>>>>>>>> -Chris
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Jul 11, 2014, at 4:26 PM, David Peixotto <dpeixott at codeaurora.org> wrote:
>>>>>>>>>
>>>>>>>>> Hi Chris,
>>>>>>>>>
>>>>>>>>> I agree that there is no simple solution. I am not opposed to your heuristic approach. I asked for stats or perf data because I know that the bug rarely happens, but I can't tell from reading the code how often the heuristic will hit. As you mentioned coalescing can have positive or negative performance impact and it would be good to understand if this change has any impact on codegen or performance for some larger benchmarks.
>>>>>>>>>
>>>>>>>>> That's my opinion anyway :)
>>>>>>>>>
>>>>>>>>> -David
>>>>>>>>>
>>>>>>>>> -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Chris Bieneman [mailto:cbieneman at apple.com]
>>>>>>>>>> Sent: Friday, July 11, 2014 2:58 PM
>>>>>>>>>> To: David Peixotto
>>>>>>>>>> Cc: llvm-commits
>>>>>>>>>> Subject: Re: [PATCH] PR18825 - New RegisterCoalescer subtarget hook
>>>>>>>>>>
>>>>>>>>>> Hi David,
>>>>>>>>>>
>>>>>>>>>> I haven’t run SPEC, but I have run all of LLVM’s in tree tests. Among
>>>>>>>>>> those I haven’t found any codegen differences except the one test that I
>>>>>>>>>> called out in my original email, and I suspect that would be a performance
>>>>>>>>>> improvement.
>>>>>>>>>>
>>>>>>>>>> That said, there are certainly cases where blocking coalescing can
>>>>>>>>>> generate worse code. I’ve been emailing back and forth with Quentin about
>>>>>>>>>> potential longer-term solutions to this problem and there really is no
>>>>>>>>>> simple solution.
>>>>>>>>>>
>>>>>>>>>> -Chris
>>>>>>>>>>
>>>>>>>>>>> On Jul 11, 2014, at 2:45 PM, David Peixotto <dpeixott at codeaurora.org>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Chris,
>>>>>>>>>>>
>>>>>>>>>>> It would be great to get this bug fixed. I have a general question about
>>>>>>>>>> the heuristic.
>>>>>>>>>>>
>>>>>>>>>>> I believe the case we need to avoid in the bug is quite rare (running
>>>>>>>>>> out of registers), but the heuristic will prevent colaescing in other
>>>>>>>>>> cases as well. Do you have any statistics for how many coalescings are
>>>>>>>>>> prevented by the heuristic and any performance data for spec or llvm test
>>>>>>>>>> suite?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> David
>>>>>>>>>>>
>>>>>>>>>>> -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>>>>>>>>>>> hosted by The Linux Foundation
>>>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-
>>>>>>>>>>>> bounces at cs.uiuc.edu] On Behalf Of Chris Bieneman
>>>>>>>>>>>> Sent: Friday, July 11, 2014 1:45 PM
>>>>>>>>>>>> To: llvm-commits
>>>>>>>>>>>> Subject: [PATCH] PR18825 - New RegisterCoalescer subtarget hook
>>>>>>>>>>>>
>>>>>>>>>>>> The attached patch adds a new subtarget hook to the
>>>>>>>>>>>> RegisterCoalescer, and an ARM implementation. The ARM implementation
>>>>>>>>>>>> works around PR18825, and also better code in cases with a large number
>>>>>>>>>> of NEON vectors in use.
>>>>>>>>>>>>
>>>>>>>>>>>> The basic problem is that the coalescer is very aggressive at
>>>>>>>>>>>> propagating constraints on the register classes, and the register
>>>>>>>>>>>> allocator doesn’t know how to split sub-registers later to recover.
>>>>>>>>>>>> This patch provides an escape valve for targets that encounter this
>>>>>>>>>> problem to limit coalescing.
>>>>>>>>>>>>
>>>>>>>>>>>> A few other notes about this patch:
>>>>>>>>>>>>
>>>>>>>>>>>> (1) In code that triggers the heuristic (using a lot of large vector
>>>>>>>>>>>> registers) this code can improve compile time substantially, and in
>>>>>>>>>>>> other code it has no adverse impact.
>>>>>>>>>>>>
>>>>>>>>>>>> (2) This code also breaks one of the tests in our test suite
>>>>>>>>>>>> (CodeGen/ARM/vldm-sched-a9.ll). This test is designed to test NEON
>>>>>>>>>>>> vector spilling, and with this patch the test no longer spills. With
>>>>>>>>>>>> the final patch I will provide a new test case to test vector
>>>>>>>>>>>> spilling, and this test will be marked XFAIL so that we don’t regress
>>>>>>>>>>>> this and start spilling here again.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> -Chris
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> llvm-commits mailing list
>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> llvm-commits mailing list
>>>>> llvm-commits at cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>




More information about the llvm-commits mailing list