[llvm-dev] Aggregate load/stores

Mon Aug 17 12:16:41 PDT 2015

2015-08-17 11:26 GMT-07:00 Mehdi Amini <mehdi.amini at apple.com>:

> Hi,
>
> On Aug 17, 2015, at 12:13 AM, deadal nix via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>
>
> 2015-08-16 23:21 GMT-07:00 David Majnemer <david.majnemer at gmail.com>:
>
>>
>>
>> Because a solution which doesn't generalize is not a very powerful
>> solution.  What happens when somebody says that they want to use atomics +
>> large aggregate loads and stores? Give them yet another, different answer?
>> That would mean our earlier, less general answer, approach was either a
>> bandaid (bad) or the new answer requires a parallel code path in their
>> frontend (worse).
>>
>
>
> +1 with David’s approach: making thing incrementally better is fine *as
> long as* the long term direction is identified. Small incremental changes
> that makes things slightly better in the short term but drives us away of
> the long term direction is not good.
>
> Don’t get me wrong, I’m not saying that the current patch is not good,
> just that it does not seem clear to me that the long term direction has
> been identified, which explain why some can be nervous about adding stuff
> prematurely.
> And I’m not for the status quo, while I can’t judge it definitively
> myself, I even bugged David last month to look at this revision and try to
> identify what is really the long term direction and how to make your (and
> other) frontends’ life easier.
>
>
>
As long as there is something to be done. Concern has been raised for very
large aggregate (64K, 1Mb) but there is no way a good codegen can come out
of these anyway. I don't know of any machine that have 1Mb of register
available to tank the load. Even I we had a good way to handle it in
InstCombine, the backend would have no capability to generate something
nice for it anyway. Most aggregates are small and there is no good excuse
to not do anything to handle them because someone could generate gigantic
ones that won't map nicely to the hardware anyway.

By that logic, SROA should not exists as one could generate gigantic
aggregate as well (in fact, SROA fail pretty badly on large aggregates).

The second concern raised is for atomic/volatile, which needs to be handled
by the optimizer differently anyway, so is mostly irrelevant here.

>
>>
>
>
> clang has many developer behind it, some of them paid to work on it. That
> s simply not the case for many others.
>
> But to answer your questions :
>  - Per field load/store generate more loads/stores than necessary in many
> cases. These can't be aggregated back because of padding.
>  - memcpy only work memory to memory. It is certainly usable in some
> cases, but certainly do not cover all uses.
>
> I'm willing to do the memcpy optimization in InstCombine (in fact, things
> would not degenerate into so much bikescheding, that would already be done).
>
>
> Calling out “bikescheding” what other devs think is what keeps the quality
> of the project high is unlikely to help your patch go through, it’s
> probably quite the opposite actually.
>
>
>
I understand the desire to keep quality high. That's is not where the
problem is. The problem lies into discussing actual proposal against
hypothetical perfect ones that do not exists.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150817/59a6be4a/attachment.html>