[llvm-dev] Aggregate load/stores
Nicholas Chapman via llvm-dev
llvm-dev at lists.llvm.org
Tue Aug 18 18:36:03 PDT 2015
Oh,
and another potential reason for handling aggregate loads and stores
directly is that it expresses the semantics of the program more clearly,
which I think should allow LLVM to optimise more aggresively.
Here's a bug report showing a missed optimisation, which I think is due
to the use of memcpy, which in turn is required to work around slow
structure loads and stores:
https://llvm.org/bugs/show_bug.cgi?id=23226
Cheers,
Nick
On 17/08/2015 22:02, mats petersson via llvm-dev wrote:
> I've definitely "run into this problem", and I would very much love to
> remove my kludges [that are incomplete, because I keep finding places
> where I need to modify the code-gen to "fix" the same problem - this
> is probably par for the course from a complete amateur compiler writer
> and someone that has only spent the last 14 months working (as a
> hobby) with LLVM].
>
> So whilst I can't contribute much on the "what is the right solution"
> and "how do we solve this", I would very much like to see something
> that allows the user of LLVM to use load/store withing things like "is
> my thing that I'm storing big, if so don't generate a load, use a
> memcpy instead". Not only does this make the usage of LLVM harder, it
> also causes slow compilation [perhaps this is a separte problem, but I
> have a simple program that copies a large struct a few times, and if I
> turn off my "use memcpy for large things", the compile time gets quite
> a lot longer - approx 1000x, and 48 seconds is a long time to compile
> 37 lines of relatively straight forward code - even the Pascal
> compiler on PDP-11/70 that I used at my school in 1980's was capable
> of doing more than 1 line per second, and it didn't run anywhere near
> 2.5GHz and had 20-30 users anytime I could use it...]
>
> ../lacsap -no-memcpy -tt longcompile.pas
> Time for Parse 0.657 ms
> Time for Analyse 0.018 ms
> Time for Compile 1.248 ms
> Time for CreateObject 48803.263 ms
> Time for CreateBinary 48847.631 ms
> Time for Compile 48854.064 ms
>
> compared with:
> ../lacsap -tt longcompile.pas
> Time for Parse 0.455 ms
> Time for Analyse 0.013 ms
> Time for Compile 1.138 ms
> Time for CreateObject 44.627 ms
> Time for CreateBinary 82.758 ms
> Time for Compile 95.797 ms
>
> wc longcompile.pas
> 37 84 410 longcompile.pas
>
> Source here:
> https://github.com/Leporacanthicus/lacsap/blob/master/test/longcompile.pas
>
>
> --
> Mats
>
> On 17 August 2015 at 21:18, deadal nix via llvm-dev
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
> OK, what about that plan :
>
> Slice the aggregate into a serie of valid loads/stores for non
> atomic ones.
> Use big scalar for atomic/volatile ones.
> Try to generate memcpy or memmove when possible ?
>
>
> 2015-08-17 12:16 GMT-07:00 deadal nix <deadalnix at gmail.com
> <mailto:deadalnix at gmail.com>>:
>
>
>
> 2015-08-17 11:26 GMT-07:00 Mehdi Amini <mehdi.amini at apple.com
> <mailto:mehdi.amini at apple.com>>:
>
> Hi,
>
>> On Aug 17, 2015, at 12:13 AM, deadal nix via llvm-dev
>> <llvm-dev at lists.llvm.org
>> <mailto:llvm-dev at lists.llvm.org>> wrote:
>>
>>
>>
>> 2015-08-16 23:21 GMT-07:00 David Majnemer
>> <david.majnemer at gmail.com <mailto:david.majnemer at gmail.com>>:
>>
>>
>>
>> Because a solution which doesn't generalize is not a
>> very powerful solution. What happens when somebody
>> says that they want to use atomics + large aggregate
>> loads and stores? Give them yet another, different
>> answer? That would mean our earlier, less general
>> answer, approach was either a bandaid (bad) or the
>> new answer requires a parallel code path in their
>> frontend (worse).
>>
>
>
> +1 with David’s approach: making thing incrementally
> better is fine *as long as* the long term direction is
> identified. Small incremental changes that makes things
> slightly better in the short term but drives us away of
> the long term direction is not good.
>
> Don’t get me wrong, I’m not saying that the current patch
> is not good, just that it does not seem clear to me that
> the long term direction has been identified, which explain
> why some can be nervous about adding stuff prematurely.
> And I’m not for the status quo, while I can’t judge it
> definitively myself, I even bugged David last month to
> look at this revision and try to identify what is really
> the long term direction and how to make your (and other)
> frontends’ life easier.
>
>
>
> As long as there is something to be done. Concern has been
> raised for very large aggregate (64K, 1Mb) but there is no way
> a good codegen can come out of these anyway. I don't know of
> any machine that have 1Mb of register available to tank the
> load. Even I we had a good way to handle it in InstCombine,
> the backend would have no capability to generate something
> nice for it anyway. Most aggregates are small and there is no
> good excuse to not do anything to handle them because someone
> could generate gigantic ones that won't map nicely to the
> hardware anyway.
>
> By that logic, SROA should not exists as one could generate
> gigantic aggregate as well (in fact, SROA fail pretty badly on
> large aggregates).
>
> The second concern raised is for atomic/volatile, which needs
> to be handled by the optimizer differently anyway, so is
> mostly irrelevant here.
>
>>
>>
>> clang has many developer behind it, some of them paid to
>> work on it. That s simply not the case for many others.
>>
>> But to answer your questions :
>> - Per field load/store generate more loads/stores than
>> necessary in many cases. These can't be aggregated back
>> because of padding.
>> - memcpy only work memory to memory. It is certainly
>> usable in some cases, but certainly do not cover all uses.
>>
>> I'm willing to do the memcpy optimization in InstCombine
>> (in fact, things would not degenerate into so much
>> bikescheding, that would already be done).
>
> Calling out “bikescheding” what other devs think is what
> keeps the quality of the project high is unlikely to help
> your patch go through, it’s probably quite the opposite
> actually.
>
>
>
> I understand the desire to keep quality high. That's is not
> where the problem is. The problem lies into discussing actual
> proposal against hypothetical perfect ones that do not exists.
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://llvm.cs.uiuc.edu
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org http://llvm.cs.uiuc.edu
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150819/17048d7b/attachment.html>
More information about the llvm-dev
mailing list