<div dir="ltr"><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small">In dynamic untyped languages like PHP the control flow is quite polymorphic and the hot/cold regions can appear within the same tracelet. This breaks the idea of having an icache which loads contiguous instructions into the cache lines. This would be a great feature to have in MCJIT to be able to tag a block and have it separated into different code sections based on the hotness. <br><br></div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small">Like Lang mentioned all of this functionality can be applied into RTDyldMemoryManager but the same functionality can be useful to other llvm JIT implementations like pyston which suffer from the same issues that PHP does. <br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Nov 20, 2014 at 2:44 PM, Lang Hames <span dir="ltr"><<a href="mailto:lhames@gmail.com" target="_blank">lhames@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><br></div><div class="gmail_extra"><span class=""><div>> Most of the work to make this work at the AOT object file level should also be necessary to</div><div>> make this work with MCJIT, which can then be taught to allocate .text.cold somewhere else.<br></div><div><br></div></span><div>MCJIT memory layout is delegated to the RTDyldMemoryManager which is supplied by the client, so if/when the AOT work is done this should just work in MCJIT without any further changes.</div><span class="HOEnZb"><font color="#888888"><div><br></div><div>- Lang.</div></font></span><div><div class="h5"><div><br></div><div class="gmail_quote">On Wed, Nov 19, 2014 at 10:49 AM, Reid Kleckner <span dir="ltr"><<a href="mailto:rnk@google.com" target="_blank">rnk@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div>I'm pretty sure LLVM does not currently have support for this kind of hot-cold BB separation, but I think it would be a welcome addition. I think the best way to add such support would be to move cold code to a separate object file section. For x86 this might look like:</div><div><br></div><div>.text</div><div>foo:</div><div> xor eax, eax</div><div> ...</div><div> jz Lcold_bb</div><div>Lreturn:</div><div> ret</div><div><br></div><div>.text.cold</div><div>Lcold_bb:</div><div> .. EH</div><div> jmp Lreturn</div><div><br></div><div>This might require changing the way we emit relocations to local labels, but it would be generally useful to others. GCC already does something like this. Most of the work to make this work at the AOT object file level should also be necessary to make this work with MCJIT, which can then be taught to allocate .text.cold somewhere else.<br></div><div class="gmail_extra"><br></div><div class="gmail_extra">---</div><div class="gmail_extra"><br></div><div class="gmail_extra">We've also discussed outlining regions of similar cold code at the LLVM IR level. This will definitely work today without many backend changes. You should be able to apply the preserve_mostcc calling convention and add a .text.cold section annotation and that should be enough to start making MCJIT memory allocation changes.</div><div><div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Nov 18, 2014 at 8:21 PM, Brett Simmers <span dir="ltr"><<a href="mailto:bsimmers@fb.com" target="_blank">bsimmers@fb.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">I'm part of a team working on adding an llvm codegen backend to HHVM (PHP JIT, <a href="http://hhvm.com" target="_blank">http://hhvm.com</a>) using MCJIT. We have a code layout problem and I'm looking for opinions on good ways to solve it.<br>
<br>
The short version is that the memory we emit code into is split into a few different areas, and we'd like a way to control which area each BasicBlock ends up in during codegen. I know this probably sounds pretty odd, so here's a much more detailed explanation:<br>
<br>
Our translation unit is a "tracelet" which is typically a single basic block of PHP code. We emit code one tracelet at a time as it's needed for execution, so the physical layout of the generated code looks like the execution path taken the first time the code is run (we're actually a little smarter than that now, but this description is good enough for the problem at hand).<br>
<br>
We're under pretty severe icache/iTLB pressure, so we do whatever we can to keep the hot path as compact as possible. One of the ways we do this is by dividing our code cache into three fixed-size areas: main, cold, and frozen. Our current, non-llvm codegen backend has one area tag per basic block, and most tracelets we compile will span all three areas. This means that if we emit code for tracelets A and then B, A's main code will be followed immediately by B's main code (and the same for cold/frozen). It ends up looking something like this:<br>
<br>
good layout: | A_main B_main <unused> | A_cold B_cold <unused> |<br>
A_frozen... |<br>
<br>
Main is the primary execution path. Cold contains code that we expect to happen rarely. Conditional branches from main to cold are generally taken <5% of the time. Frozen contains code that we expect to almost never run, but must have for correctness (mostly for handling exceptions). Conditional branches from main to frozen are generally taken <0.01% of the time. The fact that we have three areas isn't really relevant to the problem itself; I think any solution should allow an arbitrary number of areas.<br>
<br>
We've done some experiments where we collapse all three areas into one, so A's main code is followed by A's cold code, which is followed by A's frozen code, and so on for B and other tracelets, like this:<br>
<br>
bad layout: | A_main A_cold A_frozen B_main B_cold B_frozen... |<br>
<br>
This causes an unacceptable performance regression, so for our experimental llvm backend we need to come up with a solution.<br>
<br>
We're using a custom calling convention to model each tracelet as a single llvm function. I know there are intrinsics in llvm that allow us to give blocks relative probabilities, and if I'm understanding things correctly that should allow us to achieve the bad layout above, since the code for a given llvm function will be emitted in one contiguous chunk, regardless of how the blocks are organized within it. I'd like to figure out a way to achieve the good layout.<br>
<br>
The best idea I have so far is to annotate each BasicBlock with an area tag like we do in our existing codegen backend, and then teach the llvm codegen backend how to place each of those blocks in the appropriate part of our code cache. We have a custom subclass of RTDyldMemoryManager to get it emitting all code in the main area right now; I assume any solution would impact the API of this class to get at the different areas.<br>
<br>
If we have to split the main/cold/frozen parts of each tracelet into different llvm functions we might be able to make that work but I'm skeptical. My main concern is that cold/frozen code nearly always uses a bunch of values from the main code path that branched to it, and if the overhead of getting from main to cold/frozen is more than a single jmp/jcc instruction (on x86) that's going to be a dealbreaker.<br>
<br>
Does this sound doable, and is it something you'd be ok having in llvm? If we can come up with a good design my team is happy to do the actual work, though if anyone else is interested in doing it we certainly won't complain :).<br>
<br>
Thanks!<br>
Brett<br>
______________________________<u></u>_________________<br>
LLVM Developers mailing list<br>
<a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/<u></u>mailman/listinfo/llvmdev</a><br>
</blockquote></div><br></div></div></div></div>
<br>_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>
<br></blockquote></div><br></div></div></div></div>
<br>_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>
<br></blockquote></div><br></div>