[LLVMdev] Should LLVM JIT default to lazy or non-lazy?

Sat Oct 31 23:40:50 PDT 2009

2009/10/30 Török Edwin <edwintorok at gmail.com>:
> On 2009-10-29 23:55, Jeffrey Yasskin wrote:
>> On Thu, Oct 29, 2009 at 2:30 PM, Nicolas Geoffray
>> <nicolas.geoffray at lip6.fr> wrote:
>>
>>> Hi Jeffrey,
>>>
>>> Jeffrey Yasskin wrote:
>>>
>>>> Cool, I'll start implementing it.
>>>>
>>>>
>>> Great! Thanks.
>>>
>>> Just to clarify things: on my end, it doesn't really matter what is the
>>> default behavior, as long as vmkit can continue to have the existing
>>> behavior of lazy compilation. With Chris' solution, I was wondering how you
>>> would implement the getPointerToFunction{Eager, Lazy} functions when the
>>> getPointerToFunction is called by the JIT, not the user. For example, when
>>> Function F calls Function G and the JIT needs an address for G (either a
>>> callback or the function address), how will it know if it must call
>>> getPointerToFunctionEager or getPointerToFunctionLazy? Do you plan on
>>> continuing having a flag that enables/disables lazy compilation and poll
>>> this flag on each function call? How is that different than the existing
>>> system?
>>>
>>
>> Semantically, I'll thread the flag through all the calls that may
>> eventually need to recursively call getPointerToFunction. To implement
>> that without having to modify lots of calls, I'll probably replace the
>> current public default eager/lazy setting with a private flag with
>> values {Unknown, Lazy, Eager}, set it on entry and exit of
>> getPointerToFunction, and check it on each internal recursive call.
>> The difference with the current system is that the user is forced to
>> set the flag to their desired value whenever they call into the JIT,
>> rather than relying on a default. That choice then propagates through
>> the whole recursive tree of codegens, without affecting the next tree.
>>
>> Note that I'm using getPointerToFunction as an abbreviation for the
>> 3ish public functions that'll need to take this option.
>
> The documentation should also be updated
> (http://llvm.org/docs/ProgrammersManual.html#threading) to reflect what
> one needs to do,
> to ensure thread-safe JITing.

Thanks for that reminder. I've updated it in the patch I'm about to
mail, but I should apply the update regardless of whether the rest of
the patch goes in.

> Also does every JIT target support non-lazy JITing now?  See PR4816,
> last time I checked (r83242) it only worked on X86, and failed on PPC;
> so I had to keep lazy JITing enabled even if its not what I want for
> many reasons.

It's still the case that only X86 supports eager jitting. It doesn't
look that hard to add it to the rest of the targets though.

> Also perhaps the lazy compilation stub should spin waiting on a lock
> (implemented using atomics), and the compilation callback should
> execute while holding the lock just before patching the callsite, so it
> would look like this in pseudocode:

Good idea. This increases the code size a bit, but it's clearly better
than the "load the target address" option I mentioned in the bug.
Would you add it to the bug so we don't lose it?

I think we can put the entire "not yet patched" branch inside the
compilation callback to minimize the code size impact:

callsite_patch_state = 0;// for each callsite one byte of memory

callsite:
if  (atomic_load(&callsite_patch_state) != 2) {
  call CompilationCallback  // Doesn't return until the patchsite is patched.
}
//fast- and slow-path: already compiled and patched
patchsite:
      call <nop nop nop nop nop nop nop nop> // will be patched

> callsite_patch_state = 0;// for each callsite one byte of memory
>
> callsite:
> if  (atomic_load(&callsite_patch_state) == 2) {
>      //fast-path: already compiled and patched
> patchsite:
>       jmp <nop nop nop nop nop nop nop nop> // will be patched
> }
> // not yet patched, it may already be compiling
> if  (atomic_test_and_set(&callsite_patch_state, 0, 1) == 0) {
>      // not yet compiling, set state to compiling, and start compiling
>      call CompilationCallBack
>      // set state to patched
>      atomic_set(&callsite_patch_state, 2)
> }
> // wait for patched state
> while (atomic_load(&callsite_patch_state) != 2) {
>  waitJIT();
> }
> // serialize
> CPUID
> patchsite2:
> // execute new code
> jmp <nop nop nop nop nop nop nop nop> // will be patched
>
> waitJIT:
>    jitLock()
>    jitUnlock()
>
> ^This should be consistent with the Intel Manual's requirements on XMC,
> which has a similar algorithm, except for the fast-path.
>
> CompilationCallBack:
>   jitLock();
>        if (isJITed(F)) {jitUnlock(); return;}
>        JIT function
>
>        patch_callsite(&patchsite, compiledFunctionAddress);
>        patch_callsite(&patchsite2, compiledFunctionAddress);
>        setJITed(F, true);
>
>   jitUnlock();
>
> This way once it is compiled the callsite will only execute:
>    atomic_load(&callsite_patch_state)
>    == 2
>    jmp compiledFunctionAddress
>
> Best regards,
> --Edwin
>