Optimize load from aggregate stores

Fri Jun 13 10:16:10 PDT 2014

On 06/12/2014 09:25 AM, Owen Anderson wrote:
>
> On Jun 12, 2014, at 9:02 AM, Philip Reames <listmail at philipreames.com 
> <mailto:listmail at philipreames.com>> wrote:
>
>> On 06/12/2014 12:28 AM, Owen Anderson wrote:
>>>
>>> On Jun 11, 2014, at 11:29 PM, deadal nix <deadalnix at gmail.com 
>>> <mailto:deadalnix at gmail.com>> wrote:
>>>
>>>> It is gonna improve the situation quite a lot for all frontend that 
>>>> use aggregate loads (arguably, that is a bad practice, but that no 
>>>> reason to stab people in the back when they do it anyway).
>>>
>>> I’m not sure I agree with that statement.  If we don’t think they 
>>> should be used, not optimizing them is a good way to discourage 
>>> that.  More generally, I’m concerned about how we will ever get good 
>>> test coverage of this code path, since we don’t have any extant 
>>> front ends that hit it.
>> I'm joining this discussion late, but a) why are aggregate loads bad 
>> practice?  Loading something like a small struct from memory with a 
>> single load seems reasonable.
>
> They are not well supported through the code generator, and 
> introducing them would add a lot of complexity.  There are better 
> solutions already in use.  The only encouraged use case for first 
> class structs is as multiple return values.
Can you point me to a previous discussion of best practices here? I've 
honestly never seen this recommendation.

When you say "not well supported", do you mean functionally?  Or 
performance wise?

Is there an easy lowering scheme which could convert the discourage form 
into a preferred form early in the compilation process?
>
>> b) There are frontends that are not in tree.  The fact that Deadal 
>> submitted the patch is a good hint that *someone* cares.
>
> That’s not the same as saying that they *should* care.  The project 
> regularly makes policy and/or technical decisions that favor good 
> practices over bad.  As an extreme example, we don’t even try to 
> promote non-entry-block allocas to virtual registers.  Totally 
> possible, but we have collectively decided that the extra complexity 
> is not worth it when we can just tell frontend authors to adopt best 
> practice and put their allocas in the entry block.  This example is 
> firmly in line with that.
You're making a quite radical claim here.  Your comments could be read 
to imply that no effort while ever be put into improving things which 
are not already used by clang.  I know this is not what you probably 
meant, but it is the message I'm getting.

Do you have concrete objections to the patch in question?  Looking 
through the changes, I see a couple of things which can be improved, but 
it seems like a reasonable starting place.   It also seems like a low 
maintiance burden to trigger this much discussion.  :)

Just to be clear: technical feedback along the lines of "this doesn't 
seem like the right approach because of X, Y, Z; have you considered 
doing W?" is more than fine.  It's the tone and phrasing I'm objecting 
to.  (And t.m.k. the lack of documentation.)

Your comments w.r.t. testing is entirely reasonable technical feedback.
>
> Moreover, we generally prefer not to add functionality that is not 
> exercised in open code.  A single regression test is not good enough. 
>  That verifies that one case works, but we’ll never find the cases 
> that don’t work, or that interact improperly with other optimizations, 
> because we have no way of actually testing it live.
I would agree that a single regression test is not enough.  Having 
multiple such tests should be.  We routinely take patches found by 
analyzing problems with proprietary applications.  Claiming that we 
don't accept changes which we can't test in their original environment 
is simply untrue.

Just to be clear, I do understand your concerns about how to ensure good 
test coverage on changes which don't cover hot paths from existing well 
known frontends.  Deadal should address this point. However, a flat 
rejection is not the right answer if a way to test the feature 
adequately can be found.

Deedal, have you considered trying to find examples where Clang does 
emit aggregate loads and stores?  Owen mentioned the tuple return case.  
Have you considered phrasing optimization in terms of the benefit there?

Philip
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140613/57e4a346/attachment.html>