[llvm-dev] Aggregate load/stores

Mon Aug 17 13:18:06 PDT 2015

OK, what about that plan :

Slice the aggregate into a serie of valid loads/stores for non atomic ones.
Use big scalar for atomic/volatile ones.
Try to generate memcpy or memmove when possible ?

2015-08-17 12:16 GMT-07:00 deadal nix <deadalnix at gmail.com>:

>
>
> 2015-08-17 11:26 GMT-07:00 Mehdi Amini <mehdi.amini at apple.com>:
>
>> Hi,
>>
>> On Aug 17, 2015, at 12:13 AM, deadal nix via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>
>>
>> 2015-08-16 23:21 GMT-07:00 David Majnemer <david.majnemer at gmail.com>:
>>
>>>
>>>
>>> Because a solution which doesn't generalize is not a very powerful
>>> solution.  What happens when somebody says that they want to use atomics +
>>> large aggregate loads and stores? Give them yet another, different answer?
>>> That would mean our earlier, less general answer, approach was either a
>>> bandaid (bad) or the new answer requires a parallel code path in their
>>> frontend (worse).
>>>
>>
>>
>> +1 with David’s approach: making thing incrementally better is fine *as
>> long as* the long term direction is identified. Small incremental changes
>> that makes things slightly better in the short term but drives us away of
>> the long term direction is not good.
>>
>> Don’t get me wrong, I’m not saying that the current patch is not good,
>> just that it does not seem clear to me that the long term direction has
>> been identified, which explain why some can be nervous about adding stuff
>> prematurely.
>> And I’m not for the status quo, while I can’t judge it definitively
>> myself, I even bugged David last month to look at this revision and try to
>> identify what is really the long term direction and how to make your (and
>> other) frontends’ life easier.
>>
>>
>>
> As long as there is something to be done. Concern has been raised for very
> large aggregate (64K, 1Mb) but there is no way a good codegen can come out
> of these anyway. I don't know of any machine that have 1Mb of register
> available to tank the load. Even I we had a good way to handle it in
> InstCombine, the backend would have no capability to generate something
> nice for it anyway. Most aggregates are small and there is no good excuse
> to not do anything to handle them because someone could generate gigantic
> ones that won't map nicely to the hardware anyway.
>
> By that logic, SROA should not exists as one could generate gigantic
> aggregate as well (in fact, SROA fail pretty badly on large aggregates).
>
> The second concern raised is for atomic/volatile, which needs to be
> handled by the optimizer differently anyway, so is mostly irrelevant here.
>
>
>>
>>>
>>
>>
>> clang has many developer behind it, some of them paid to work on it. That
>> s simply not the case for many others.
>>
>> But to answer your questions :
>>  - Per field load/store generate more loads/stores than necessary in many
>> cases. These can't be aggregated back because of padding.
>>  - memcpy only work memory to memory. It is certainly usable in some
>> cases, but certainly do not cover all uses.
>>
>> I'm willing to do the memcpy optimization in InstCombine (in fact, things
>> would not degenerate into so much bikescheding, that would already be done).
>>
>>
>> Calling out “bikescheding” what other devs think is what keeps the
>> quality of the project high is unlikely to help your patch go through, it’s
>> probably quite the opposite actually.
>>
>>
>>
> I understand the desire to keep quality high. That's is not where the
> problem is. The problem lies into discussing actual proposal against
> hypothetical perfect ones that do not exists.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150817/efc3ebd5/attachment.html>