<div dir="ltr"><div><div><div><div><div><div><div><div><div>It is pretty clear people need this. Let's get this moving.<br><br></div>I'll try to sum up the point that have been made and I'll try to address them carefully.<br></div><br>1/ There is no good solution for large aggregates.<br></div>That is true. However, I don't think this is a reason to not address smaller aggregates, as they appear to be needed. Realistically, the proportion of aggregates that are very large is small, and there is no expectation that such a thing would map nicely to the hardware anyway (the hardware won't have enough registers to load it all anyway). I do think this is reasonable to expect a reasonable handling of relatively small aggregates like fat pointers while accepting that larges ones will be inefficient.<br><br></div><div>This limitation is not unique to the current discussion, as SROA suffer from the same limitation.<br></div><div>It is possible to disable to transformation for aggregates that are too large if this is too big of a concern. It should maybe also be done for SROA.<br></div><div><br></div>2/ Slicing the aggregate break the semantic of atomic/volatile.<br></div>That is true. It means slicing the aggregate should not be done for atomic/volatile. It doesn't mean this should not be done for regular ones as it is reasonable to handle atomic/volatile differently. After all, they have different semantic.<br><br></div>3/ Not slicing can create scalar that aren't supported by the target. This is undesirable.<br></div>Indeed. But as always, the important question is compared to what ?<br><br></div>The hardware has no notion of aggregate, so an aggregate or a large scalar ends up both requiring legalization. Doing the transformation is still beneficial :<br></div> - Some aggregates will generate valid scalars. For such aggregate, this is 100% win.<br><div><div><div><div><div><div><div><div> - For aggregate that won't, the situation is still better as various optimization passes will be able to handle the load in a sensible manner.<br></div><div> - The transformation never make the situation worse than it is to begin with.<br><br></div><div>On previous discussion, Hal Finkel seemed to think that the scalar solution is preferable to the slicing one.<br><br></div><div>Is that a fair assessment of the situation ? Considering all of this, I think the right path forward is :<br></div><div> - Go for the scalar solution in the general case.<br></div><div> - If that is a problem, the slicing approach can be used for non atomic/volatile.<br></div><div> - If necessary, disable the transformation for very large aggregates (and consider doing so for SROA as well).<br><br></div><div>Do we have a plan ?<br></div><div><br></div></div></div></div></div></div></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">2015-08-18 18:36 GMT-07:00 Nicholas Chapman via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div bgcolor="#FFFFFF" text="#000000">

    Oh,<br>

    and another potential reason for handling aggregate loads and stores

    directly is that it expresses the semantics of the program more

    clearly, which I think should allow LLVM to optimise more

    aggresively.<br>

    Here's a bug report showing a missed optimisation, which I think is

    due to the use of memcpy, which in turn is required to work around

    slow structure loads and stores:<br>

    <a href="https://llvm.org/bugs/show_bug.cgi?id=23226" target="_blank">https://llvm.org/bugs/show_bug.cgi?id=23226</a><br>

    <br>

    Cheers,<br>

      Nick<span class=""><br>

    <pre cols="72"></pre>

    <div>On 17/08/2015 22:02, mats petersson via

      llvm-dev wrote:<br>

    </div>

    </span><div><div class="h5"><blockquote type="cite">

      <div dir="ltr">

        <div>

          <div>

            <div>

              <div>

                <div>

                  <div>I've definitely "run into this problem", and I

                    would very much love to remove my kludges [that are

                    incomplete, because I keep finding places where I

                    need to modify the code-gen to "fix" the same

                    problem - this is probably par for the course from a

                    complete amateur compiler writer and someone that

                    has only spent the last 14 months working (as a

                    hobby) with LLVM]. <br>

                    <br>

                  </div>

                  So whilst I can't contribute much on the "what is the

                  right solution" and "how do we solve this", I would

                  very much like to see something that allows the user

                  of LLVM to use load/store withing things like "is my

                  thing that I'm storing big, if so don't generate a

                  load, use a memcpy instead". Not only does this make

                  the usage of LLVM harder, it also causes slow

                  compilation [perhaps this is a separte problem, but I

                  have a simple program that copies a large struct a few

                  times, and if I turn off my "use memcpy for large

                  things", the compile time gets quite a lot longer -

                  approx 1000x, and 48 seconds is a long time to compile

                  37 lines of relatively straight forward code - even

                  the Pascal compiler on PDP-11/70 that I used at my

                  school in 1980's was capable of doing more than 1 line

                  per second, and it didn't run anywhere near 2.5GHz and

                  had 20-30 users anytime I could use it...]<br>

                  <br>

                  ../lacsap -no-memcpy -tt longcompile.pas <br>

                  Time for Parse 0.657 ms<br>

                  Time for Analyse 0.018 ms<br>

                  Time for Compile 1.248 ms<br>

                  Time for CreateObject 48803.263 ms<br>

                  Time for CreateBinary 48847.631 ms<br>

                  Time for Compile 48854.064 ms<br>

                  <br>

                </div>

                compared with:<br>

                ../lacsap -tt longcompile.pas <br>

                Time for Parse 0.455 ms<br>

                Time for Analyse 0.013 ms<br>

                Time for Compile 1.138 ms<br>

                Time for CreateObject 44.627 ms<br>

                Time for CreateBinary 82.758 ms<br>

                Time for Compile 95.797 ms<br>

                <br>

              </div>

              wc longcompile.pas <br>

               37  84 410 longcompile.pas<br>

              <br>

            </div>

            Source here:<br>

            <a href="https://github.com/Leporacanthicus/lacsap/blob/master/test/longcompile.pas" target="_blank">https://github.com/Leporacanthicus/lacsap/blob/master/test/longcompile.pas</a><br>

            <br>

          </div>

          <br>

          --<br>

        </div>

        Mats<br>

      </div>

      <div class="gmail_extra"><br>

        <div class="gmail_quote">On 17 August 2015 at 21:18, deadal nix

          via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div dir="ltr">

              <div>

                <div>

                  <div>OK, what about that plan :<br>

                    <br>

                  </div>

                  Slice the aggregate into a serie of valid loads/stores

                  for non atomic ones.<br>

                </div>

                Use big scalar for atomic/volatile ones.<br>

              </div>

              Try to generate memcpy or memmove when possible ?<br>

              <div><br>

              </div>

            </div>

            <div>

              <div>

                <div class="gmail_extra"><br>

                  <div class="gmail_quote">2015-08-17 12:16 GMT-07:00

                    deadal nix <span dir="ltr"><<a href="mailto:deadalnix@gmail.com" target="_blank">deadalnix@gmail.com</a>></span>:<br>

                    <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                      <div dir="ltr"><br>

                        <div class="gmail_extra"><br>

                          <div class="gmail_quote"><span>2015-08-17

                              11:26 GMT-07:00 Mehdi Amini <span dir="ltr"><<a href="mailto:mehdi.amini@apple.com" target="_blank">mehdi.amini@apple.com</a>></span>:<br>

                              <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                                <div style="word-wrap:break-word">Hi,

                                  <div><br>

                                    <div><span>

                                        <blockquote type="cite">

                                          <div>On Aug 17, 2015, at 12:13

                                            AM, deadal nix via llvm-dev

                                            <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>>

                                            wrote:</div>

                                          <br>

                                          <div>

                                            <div dir="ltr"><br>

                                              <div class="gmail_extra"><br>

                                                <div class="gmail_quote">2015-08-16

                                                  23:21 GMT-07:00 David

                                                  Majnemer <span dir="ltr"><<a href="mailto:david.majnemer@gmail.com" target="_blank">david.majnemer@gmail.com</a>></span>:<br>

                                                  <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                                                    <div dir="ltr"><br>

                                                      <div class="gmail_extra"><br>

                                                        <div class="gmail_quote"><span></span>

                                                          <div>Because a

                                                          solution which

                                                          doesn't

                                                          generalize is

                                                          not a very

                                                          powerful

                                                          solution. 

                                                          What happens

                                                          when somebody

                                                          says that they

                                                          want to use

                                                          atomics +

                                                          large

                                                          aggregate

                                                          loads and

                                                          stores? Give

                                                          them yet

                                                          another,

                                                          different

                                                          answer? That

                                                          would mean our

                                                          earlier, less

                                                          general

                                                          answer,

                                                          approach was

                                                          either a

                                                          bandaid (bad)

                                                          or the new

                                                          answer

                                                          requires a

                                                          parallel code

                                                          path in their

                                                          frontend

                                                          (worse).</div>

                                                        </div>

                                                      </div>

                                                    </div>

                                                  </blockquote>

                                                </div>

                                              </div>

                                            </div>

                                          </div>

                                        </blockquote>

                                        <div><br>

                                        </div>

                                        <div><br>

                                        </div>

                                      </span>

                                      <div>+1 with David’s approach:

                                        making thing incrementally

                                        better is fine *as long as* the

                                        long term direction is

                                        identified. Small incremental

                                        changes that makes things

                                        slightly better in the short

                                        term but drives us away of the

                                        long term direction is not good.</div>

                                      <div><br>

                                      </div>

                                      <div>Don’t get me wrong, I’m not

                                        saying that the current patch is

                                        not good, just that it does not

                                        seem clear to me that the long

                                        term direction has been

                                        identified, which explain why

                                        some can be nervous about adding

                                        stuff prematurely. </div>

                                      <div>And I’m not for the status

                                        quo, while I can’t judge it

                                        definitively myself, I even

                                        bugged David last month to look

                                        at this revision and try to

                                        identify what is really the long

                                        term direction and how to make

                                        your (and other) frontends’ life

                                        easier. </div>

                                      <span>

                                        <div><br>

                                        </div>

                                        <div><br>

                                        </div>

                                      </span></div>

                                  </div>

                                </div>

                              </blockquote>

                              <div><br>

                              </div>

                            </span>

                            <div>As long as there is something to be

                              done. Concern has been raised for very

                              large aggregate (64K, 1Mb) but there is no

                              way a good codegen can come out of these

                              anyway. I don't know of any machine that

                              have 1Mb of register available to tank the

                              load. Even I we had a good way to handle

                              it in InstCombine, the backend would have

                              no capability to generate something nice

                              for it anyway. Most aggregates are small

                              and there is no good excuse to not do

                              anything to handle them because someone

                              could generate gigantic ones that won't

                              map nicely to the hardware anyway.<br>

                              <br>

                            </div>

                            <div>By that logic, SROA should not exists

                              as one could generate gigantic aggregate

                              as well (in fact, SROA fail pretty badly

                              on large aggregates).<br>

                              <br>

                            </div>

                            <div>The second concern raised is for

                              atomic/volatile, which needs to be handled

                              by the optimizer differently anyway, so is

                              mostly irrelevant here.<br>

                            </div>

                          </div>

                          <span>

                            <div class="gmail_quote">

                              <div> </div>

                              <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                                <div style="word-wrap:break-word">

                                  <div>

                                    <div><span>

                                        <blockquote type="cite">

                                          <div>

                                            <div dir="ltr">

                                              <div class="gmail_extra">

                                                <div class="gmail_quote">

                                                  <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                                                    <div dir="ltr">

                                                      <div class="gmail_extra">

                                                        <div class="gmail_quote"><span>

                                                          <div> </div>

                                                          </span></div>

                                                      </div>

                                                    </div>

                                                  </blockquote>

                                                  <br>

                                                </div>

                                                <br>

                                              </div>

                                              <div class="gmail_extra">clang

                                                has many developer

                                                behind it, some of them

                                                paid to work on it. That

                                                s simply not the case

                                                for many others.<br>

                                                <br>

                                              </div>

                                              <div class="gmail_extra">But

                                                to answer your questions

                                                :<br>

                                              </div>

                                              <div class="gmail_extra"> -

                                                Per field load/store

                                                generate more

                                                loads/stores than

                                                necessary in many cases.

                                                These can't be

                                                aggregated back because

                                                of padding.<br>

                                              </div>

                                              <div class="gmail_extra"> -

                                                memcpy only work memory

                                                to memory. It is

                                                certainly usable in some

                                                cases, but certainly do

                                                not cover all uses.<br>

                                              </div>

                                              <div class="gmail_extra"><br>

                                              </div>

                                              <div class="gmail_extra">I'm

                                                willing to do the memcpy

                                                optimization in

                                                InstCombine (in fact,

                                                things would not

                                                degenerate into so much

                                                bikescheding, that would

                                                already be done).<br>

                                              </div>

                                            </div>

                                          </div>

                                        </blockquote>

                                        <div><br>

                                        </div>

                                      </span></div>

                                  </div>

                                  <div>Calling out “bikescheding” what

                                    other devs think is what keeps the

                                    quality of the project high is

                                    unlikely to help your patch go

                                    through, it’s probably quite the

                                    opposite actually.</div>

                                  <div><br>

                                  </div>

                                  <div><br>

                                  </div>

                                </div>

                              </blockquote>

                              <br>

                            </div>

                          </span>I understand the desire to keep quality

                          high. That's is not where the problem is. The

                          problem lies into discussing actual proposal

                          against hypothetical perfect ones that do not

                          exists.<br>

                        </div>

                        <div class="gmail_extra"><br>

                        </div>

                      </div>

                    </blockquote>

                  </div>

                  <br>

                </div>

              </div>

            </div>

            <br>

            _______________________________________________<br>

            LLVM Developers mailing list<br>

            <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a> 

                   <a href="http://llvm.cs.uiuc.edu" rel="noreferrer" target="_blank">http://llvm.cs.uiuc.edu</a><br>

            <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

            <br>

          </blockquote>

        </div>

        <br>

      </div>

      <br>

      <fieldset></fieldset>

      <br>

      <pre>_______________________________________________

LLVM Developers mailing list

<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>

</pre>

    </blockquote>

    <br>

  </div></div></div>

<br>_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

<br></blockquote></div><br></div>