[llvm-dev] [inline-asm][asm-goto] Supporting "asm goto" in inline assembly

Thu Apr 6 19:05:29 PDT 2017

> On 5 Apr 2017, at 06:13, Matthias Braun via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
>> 
>> On Apr 4, 2017, at 11:44 AM, John McCall via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> 
>>> On Apr 4, 2017, at 2:12 PM, Matthias Braun <matze at braunis.de <mailto:matze at braunis.de>> wrote:
>>> My two cents:
>>> 
>>> - I think inline assembly should work even if the compiler cannot parse the contents. This would rule out msvc inline assembly (or alternatively put all the parsing and interpretation burden on the frontend), but would work with gcc asm goto which specifies possible targets separately.
>>> - Supporting control flow in inline assembly by allowing jumps out of an assembly block seems natural to me.
>>> - Jumping into an inline assembly block seems like an unnecessary feature to me.
>>> - To have this working in lib/CodeGen we would need an alternative opcode with the terminator flag set. (There should also be opportunities to remodel some instruction flags in the backend, to be part of the MachineInstr instead of the opcode, but that is an orthogonal discussion to this)
>>> - I don't foresee big problems in CodeGen, we should take a look on how computed goto is implementation to find ways to reference arbitrary basic blocks.
>>> - The register allocator fails when the terminator instruction also writes a register which is subsequently spilled (none of the existing targets does that, but you could specify this situation in inline assembly).
>>> - I'd always prefer intrinsics over inline assembly. Hey, why don't we add a -Wassembly that warns on inline assembly usage and is enabled by default...
>>> - I still think inline assembly is valuable for new architecture bringup/experimentation situations.
>> 
>> To me, this feels like a great example of "we really wanted a language feature, but we figured out that we could hack it in using inline assembly in a way that's ultimately significantly harder for the compiler to support than a language feature, and now it's your problem."  I agree with Chandler that we should just design and implement the language feature.
>> 
>> I would recommend:
>> 
>>   if (__builtin_patchable_branch("section name")) {
>>     trace();
>>   }
>> 
>> ==>
>> 
>>   %0 = call i1 @llvm.patchable_branch(i8* @sectionNameString)
>>   br %0, ...
>> 
>> where @llvm.patchable_branch has the semantics of appending whatever patching information is necessary to the given section such that, if you apply the patch, it will change the result of the call from 0 to 1.  That can then typically be pattern-matched in the backend to get the optimal codegen.
>> 
>> If I might recommend a better ABI for the patching information: consider using a pair of relative pointers, one from the patching information to the patchable instruction, and one from the patchable instruction to the new target.  That would allow the patching information to be relocated at zero cost.
>> 
>> The actual details of how to apply the patch, and what the inline patchable-instruction sequence needs to be in order to accept the patch, would be target-specific.  The documented motivating example seems to assume that a single nop is always big enough, which is pretty questionable.
>> 
>> This feature could be made potentially interesting to e.g. JIT authors by allowing the patching information to be embellished with additional information to identify the source branch.
> 
> I completely agree that for this example we rather want a proper intrinsic. As a matter of fact we have similar mechanism in CodeGen already to support the XRay feature.

I for one would really like that intrinsic. I have something similar under review, which wraps a function call for XRay's custom event logging feature (I should've sent an RFC on this I realise, I'll do that next). Patch doing this in particular for XRay's requirements are in 
https://reviews.llvm.org/D27503 <https://reviews.llvm.org/D27503> -- wherein we do the following:

- In LLVM IR, lower calls to the @llvm.xray.customevent(...) intrinsic into something like:

  # align to 2 byte address
  .xray_sled_N
  jmp +NN
  # calling convention setup
  call <XRay's trampoline>

  We also mark the point where this sled is in the instrumentation map.

- At runtime we overwrite the jump to become nops.

If we get this patchable branch intrinsic, then we can just certainly use that in lowering the XRay built-in we're trying to add to Clang as well.

/me goes writing up the RFC for the custom event logging.

Cheers

-- Dean

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170407/138bab15/attachment.html>