<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Aug 16, 2015 at 10:27 PM, deadal nix <span dir="ltr"><<a href="mailto:deadalnix@gmail.com" target="_blank">deadalnix@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span class="">2015-08-16 22:10 GMT-07:00 David Majnemer <span dir="ltr"><<a href="mailto:david.majnemer@gmail.com" target="_blank">david.majnemer@gmail.com</a>></span>:<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span><div><br></div></span><div>I would argue that a fix in the wrong direction is worse than the status quo.</div></div></div></div></blockquote><div><br></div></span><div>How is proposed change worse than status quo ?<br></div></div></div></div></blockquote><div><br></div><div>Because a solution which doesn't generalize is not a very powerful solution.  What happens when somebody says that they want to use atomics + large aggregate loads and stores? Give them yet another, different answer? That would mean our earlier, less general answer, approach was either a bandaid (bad) or the new answer requires a parallel code path in their frontend (worse).</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div></div><span class=""><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div></div><div><br></div><div><div>The argument that target are relying on InstCombine to mitigate IR requiring legalization seems dubious to me. First, because both aggregate and large scalar require legalization, so, if not ideal, the proposed change does not makes things any worse than they already are. In fact, as far as legalization is concerned, theses are pretty much the same. It should also be noted that InstCombine is not guaranteed to run before the target, so it seems like a bad idea to me to rely on it in the backend.<br></div></div></div></blockquote><div><br></div></span><div>InstCombine is not guaranteed to run before IR hits the backend but the result of legalizing the machinations of InstCombine's output during SelectionDAG is worse than generating illegal IR in the first place.</div></div></div></div></blockquote><div><br></div></span><div>That does not follow. InstCombine is not creating new things that require legalisation, it changes one thing that require legalization into another that a larger part of LLVM can understand.<br></div></div></div></div></blockquote><div><br></div><div>I'm afraid I don't understand what you are getting at here.  InstCombine carefully avoids ptrtoint to weird types, truncs to weird types, etc. when creating new IR.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div></div><span class=""><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div></div><div><br>As for the big integral thing, I really don't care. I can change it to 

create multiple loads/stores respecting data layout, I have the code for

 that and could adapt it for this PR without too much trouble. If this 

is the only thing that is blocking this PR, then we can proceed. But I'd

 like some notion that we are making progress. Would you be willing to accept a solution based on creating a serie of load/store respecting the datalayout ?<br></div></div></div></blockquote><div><br></div></span><div>Splitting the memory operation into smaller operations is not semantics preserving from an IR-theoretic perspective.  For example, splitting a volatile memory operation into several volatile memory operations is not OK.  Same goes with atomics.  Some targets provide atomic memory operations at the granularity of a cache line and splitting at legal integer granularity would be observably different.</div><div><br></div></div></div></div></blockquote><div><br></div></span><div>That is off topic. Proposed patch explicitly gate for this.<br></div></div></div></div></blockquote><div><br></div><div>Then I guess we agree to disagree about what is "on topic".  I think that our advice to frontend authors regarding larger-than-legal loads/stores should be uniform and not dependent on whether or not the operation was or was not volatile.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div></div><span class=""><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote"><div>With the above in mind, I don't see it as unreasonable for frontends to generate IR that LLVM is comfortable with.  We seem fine telling frontend authors that they should strive to avoid large aggregate memory operations in our performance tips guide <<a href="http://llvm.org/docs/Frontend/PerformanceTips.html#avoid-loads-and-stores-of-large-aggregate-type" target="_blank">http://llvm.org/docs/Frontend/PerformanceTips.html#avoid-loads-and-stores-of-large-aggregate-type</a>>.  Implementation experience with Clang hasn't shown this to be particularly odious to follow and none of the LLVM-side solutions seem satisfactory.</div><span></span><br></div></div></div></blockquote><div><br></div></span><div>Most front end do not have clang resources. Additionally, this tip is not quite accurate. I'm not interested in large aggregate load/store at this stage. I'm interested in ANY aggregate load/store. LLVM is just unable to handle any of it in a way that make sense. It could certainly do better for small aggregate, without too much trouble.<br><br></div></div></div></div>

</blockquote></div><br></div><div class="gmail_extra">I'm confused what you mean about "clang resources" here, you haven't made it clear what the burden it is to your frontend. I'm not saying that there isn't such a burden, I just haven't seen it been articulated and I have heard nothing similar from other folks using LLVM.  What prevents you from performing field-at-a-time loads and stores or calls to the memcpy intrinsic?</div></div>