[llvm-dev] Aggregate load/stores
Philip Reames via llvm-dev
llvm-dev at lists.llvm.org
Wed Aug 19 10:06:29 PDT 2015
This thread is deep enough and the start of it confrontational enough,
that I doubt enough people are reading this deep. Please rephrase this
as a separate RFC to ensure visibility.
For the record, the overall direction your sketching seems entirely
reasonable to me.
Philip
On 08/18/2015 10:31 PM, deadal nix via llvm-dev wrote:
> It is pretty clear people need this. Let's get this moving.
>
> I'll try to sum up the point that have been made and I'll try to
> address them carefully.
>
> 1/ There is no good solution for large aggregates.
> That is true. However, I don't think this is a reason to not address
> smaller aggregates, as they appear to be needed. Realistically, the
> proportion of aggregates that are very large is small, and there is no
> expectation that such a thing would map nicely to the hardware anyway
> (the hardware won't have enough registers to load it all anyway). I do
> think this is reasonable to expect a reasonable handling of relatively
> small aggregates like fat pointers while accepting that larges ones
> will be inefficient.
>
> This limitation is not unique to the current discussion, as SROA
> suffer from the same limitation.
> It is possible to disable to transformation for aggregates that are
> too large if this is too big of a concern. It should maybe also be
> done for SROA.
>
> 2/ Slicing the aggregate break the semantic of atomic/volatile.
> That is true. It means slicing the aggregate should not be done for
> atomic/volatile. It doesn't mean this should not be done for regular
> ones as it is reasonable to handle atomic/volatile differently. After
> all, they have different semantic.
>
> 3/ Not slicing can create scalar that aren't supported by the target.
> This is undesirable.
> Indeed. But as always, the important question is compared to what ?
>
> The hardware has no notion of aggregate, so an aggregate or a large
> scalar ends up both requiring legalization. Doing the transformation
> is still beneficial :
> - Some aggregates will generate valid scalars. For such aggregate,
> this is 100% win.
> - For aggregate that won't, the situation is still better as various
> optimization passes will be able to handle the load in a sensible manner.
> - The transformation never make the situation worse than it is to
> begin with.
>
> On previous discussion, Hal Finkel seemed to think that the scalar
> solution is preferable to the slicing one.
>
> Is that a fair assessment of the situation ? Considering all of this,
> I think the right path forward is :
> - Go for the scalar solution in the general case.
> - If that is a problem, the slicing approach can be used for non
> atomic/volatile.
> - If necessary, disable the transformation for very large aggregates
> (and consider doing so for SROA as well).
>
> Do we have a plan ?
>
>
> 2015-08-18 18:36 GMT-07:00 Nicholas Chapman via llvm-dev
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>:
>
> Oh,
> and another potential reason for handling aggregate loads and
> stores directly is that it expresses the semantics of the program
> more clearly, which I think should allow LLVM to optimise more
> aggresively.
> Here's a bug report showing a missed optimisation, which I think
> is due to the use of memcpy, which in turn is required to work
> around slow structure loads and stores:
> https://llvm.org/bugs/show_bug.cgi?id=23226
>
> Cheers,
> Nick
> On 17/08/2015 22:02, mats petersson via llvm-dev wrote:
>> I've definitely "run into this problem", and I would very much
>> love to remove my kludges [that are incomplete, because I keep
>> finding places where I need to modify the code-gen to "fix" the
>> same problem - this is probably par for the course from a
>> complete amateur compiler writer and someone that has only spent
>> the last 14 months working (as a hobby) with LLVM].
>>
>> So whilst I can't contribute much on the "what is the right
>> solution" and "how do we solve this", I would very much like to
>> see something that allows the user of LLVM to use load/store
>> withing things like "is my thing that I'm storing big, if so
>> don't generate a load, use a memcpy instead". Not only does this
>> make the usage of LLVM harder, it also causes slow compilation
>> [perhaps this is a separte problem, but I have a simple program
>> that copies a large struct a few times, and if I turn off my "use
>> memcpy for large things", the compile time gets quite a lot
>> longer - approx 1000x, and 48 seconds is a long time to compile
>> 37 lines of relatively straight forward code - even the Pascal
>> compiler on PDP-11/70 that I used at my school in 1980's was
>> capable of doing more than 1 line per second, and it didn't run
>> anywhere near 2.5GHz and had 20-30 users anytime I could use it...]
>>
>> ../lacsap -no-memcpy -tt longcompile.pas
>> Time for Parse 0.657 ms
>> Time for Analyse 0.018 ms
>> Time for Compile 1.248 ms
>> Time for CreateObject 48803.263 ms
>> Time for CreateBinary 48847.631 ms
>> Time for Compile 48854.064 ms
>>
>> compared with:
>> ../lacsap -tt longcompile.pas
>> Time for Parse 0.455 ms
>> Time for Analyse 0.013 ms
>> Time for Compile 1.138 ms
>> Time for CreateObject 44.627 ms
>> Time for CreateBinary 82.758 ms
>> Time for Compile 95.797 ms
>>
>> wc longcompile.pas
>> 37 84 410 longcompile.pas
>>
>> Source here:
>> https://github.com/Leporacanthicus/lacsap/blob/master/test/longcompile.pas
>>
>>
>> --
>> Mats
>>
>> On 17 August 2015 at 21:18, deadal nix via llvm-dev
>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>
>> OK, what about that plan :
>>
>> Slice the aggregate into a serie of valid loads/stores for
>> non atomic ones.
>> Use big scalar for atomic/volatile ones.
>> Try to generate memcpy or memmove when possible ?
>>
>>
>> 2015-08-17 12:16 GMT-07:00 deadal nix <deadalnix at gmail.com
>> <mailto:deadalnix at gmail.com>>:
>>
>>
>>
>> 2015-08-17 11:26 GMT-07:00 Mehdi Amini
>> <mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>>:
>>
>> Hi,
>>
>>> On Aug 17, 2015, at 12:13 AM, deadal nix via
>>> llvm-dev <llvm-dev at lists.llvm.org
>>> <mailto:llvm-dev at lists.llvm.org>> wrote:
>>>
>>>
>>>
>>> 2015-08-16 23:21 GMT-07:00 David Majnemer
>>> <david.majnemer at gmail.com
>>> <mailto:david.majnemer at gmail.com>>:
>>>
>>>
>>>
>>> Because a solution which doesn't generalize is
>>> not a very powerful solution. What happens when
>>> somebody says that they want to use atomics +
>>> large aggregate loads and stores? Give them yet
>>> another, different answer? That would mean our
>>> earlier, less general answer, approach was
>>> either a bandaid (bad) or the new answer
>>> requires a parallel code path in their frontend
>>> (worse).
>>>
>>
>>
>> +1 with David’s approach: making thing incrementally
>> better is fine *as long as* the long term direction
>> is identified. Small incremental changes that makes
>> things slightly better in the short term but drives
>> us away of the long term direction is not good.
>>
>> Don’t get me wrong, I’m not saying that the current
>> patch is not good, just that it does not seem clear
>> to me that the long term direction has been
>> identified, which explain why some can be nervous
>> about adding stuff prematurely.
>> And I’m not for the status quo, while I can’t judge
>> it definitively myself, I even bugged David last
>> month to look at this revision and try to identify
>> what is really the long term direction and how to
>> make your (and other) frontends’ life easier.
>>
>>
>>
>> As long as there is something to be done. Concern has
>> been raised for very large aggregate (64K, 1Mb) but there
>> is no way a good codegen can come out of these anyway. I
>> don't know of any machine that have 1Mb of register
>> available to tank the load. Even I we had a good way to
>> handle it in InstCombine, the backend would have no
>> capability to generate something nice for it anyway. Most
>> aggregates are small and there is no good excuse to not
>> do anything to handle them because someone could generate
>> gigantic ones that won't map nicely to the hardware anyway.
>>
>> By that logic, SROA should not exists as one could
>> generate gigantic aggregate as well (in fact, SROA fail
>> pretty badly on large aggregates).
>>
>> The second concern raised is for atomic/volatile, which
>> needs to be handled by the optimizer differently anyway,
>> so is mostly irrelevant here.
>>
>>>
>>>
>>> clang has many developer behind it, some of them
>>> paid to work on it. That s simply not the case for
>>> many others.
>>>
>>> But to answer your questions :
>>> - Per field load/store generate more loads/stores
>>> than necessary in many cases. These can't be
>>> aggregated back because of padding.
>>> - memcpy only work memory to memory. It is
>>> certainly usable in some cases, but certainly do not
>>> cover all uses.
>>>
>>> I'm willing to do the memcpy optimization in
>>> InstCombine (in fact, things would not degenerate
>>> into so much bikescheding, that would already be done).
>>
>> Calling out “bikescheding” what other devs think is
>> what keeps the quality of the project high is
>> unlikely to help your patch go through, it’s probably
>> quite the opposite actually.
>>
>>
>>
>> I understand the desire to keep quality high. That's is
>> not where the problem is. The problem lies into
>> discussing actual proposal against hypothetical perfect
>> ones that do not exists.
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://llvm.cs.uiuc.edu
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> http://llvm.cs.uiuc.edu
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150819/30e0e365/attachment-0001.html>
More information about the llvm-dev
mailing list