[llvm-dev] [inline-asm][asm-goto] Supporting "asm goto" in inline assembly

David Woodhouse via llvm-dev llvm-dev at lists.llvm.org
Wed Feb 14 02:34:16 PST 2018


On Tue, 2017-04-04 at 16:26 +0000, Chandler Carruth via llvm-dev wrote:
> On Tue, Apr 4, 2017 at 6:07 AM Yatsina, Marina via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> > Asm goto feature was introduces to GCC in order to optimize the
> > support for tracepoints in Linux kernel (it can be used for other
> > things that do nop patching).
> >  
> > GCC documentation describes their motivating example here:
> > 
> > https://gcc.gnu.org/onlinedocs/gcc-4.8.4/gcc/Extended-Asm.html
> >  
> >      #define TRACE1(NUM)                         \
> >        do {                                      \
> >          asm goto ("0: nop;"                     \
> >                    ".pushsection trace_table;"   \
> >                    ".long 0b, %l0;"              \
> >                    ".popsection"                 \
> >                    : : : : trace#NUM);           \
> >          if (0) { trace#NUM: trace(); }          \
> >        } while (0)
> >      #define TRACE  TRACE1(__COUNTER__)
> > In this example (which in fact inspired the asm goto feature) we
> > want on rare occasions to call the trace function; on other
> > occasions we'd like to keep the overhead to the absolute minimum.
> > The normal code path consists of a single nop instruction. However,
> > we record the address of this nop together with the address of a
> > label that calls the trace function. This allows the nop
> > instruction to be patched at run time to be an unconditional branch
> > to the stored label. It is assumed that an optimizing compiler
> > moves the labeled block out of line, to optimize the fall through
> > path from the asm.
> > Here is the Linux kernel RFC which discusses the old C way of
> > implementing it and the performance issues that were noticed.
> > It also states some performance numbers of the old C code vs. the
> > asm goto:
> > https://lwn.net/Articles/350714/
> >  
> > This LTTng (Linux Trace Toolkit Next Generation) presentation talks
> > about using this feature as a way of optimize static tracepoints
> > (slides 3-4)
> > https://www.computer.org/cms/ComputingNow/HomePage/2011/0111/rW_SW_
> > UsingTracing.pdf
> > This presentation also mentions that a lot of other Linux
> > applications use this tracing mechanism.
> 
> Thanks, this is exactly the kind of discussion that I think will help
> make progress here.
> 
> I think this feature makes a lot of sense and is a really nice
> feature. However, I think implementing it with inline assembly
> imposes a lot of really unfortunate constraints on compilation -- it
> requires asm goto, pushsection and popsection, etc.
> 
> I would much rather provide a much more direct way to represent a
> patchable nop and the addresses of label within a function. For
> example, I could imagine something like:
> 
> ```
>   if (0) { trace_call: /* code to call the trace function */ }
>   patch: __builtin_patchable_nop()
>   __builtin_save_labels(trace_call, patch)
> ```
> 
> But someone can probably design a much better way to represent this
> in Clang. The advantages I see here (admittedly, mostly for the
> implementation in Clang and LLVM):
> 
> 1) It allows Clang and LLVM to model this with running an assembler
> over anything.
> 2) It doesn't require new terminators in LLVM's IR
> 3) We already have intrinsics in LLVM's IR that could easily be
> extended to produce a nop.
> 4) It would be portable -- each backend could select an appropriate
> sized nop to patch a jump into
> 
> Would this make sense?


Let's not conflate the asm-goto part with the .pushsection/.popsection.

The latter ("0: .pushsection foo; .long 0b; .popsection") is used *all*
over the kernel to build up tables of code locations — for exception
handling of instructions which might fault, as well as for runtime
patching of instructions like the above. It's not always a nop vs. call
alternative.

It would be nice to have the compiler assist with that. We currently
have code to trawl through all the built object files and find calls to
__fentry__ so we can patch them in/out at runtime, for example. And we
might considered doing the same for calls to the retpoline thunks.

But I think we would be best served right now by considering that out
of scope, and looking *only* at the part which is handled by 'asm
goto'.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5213 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180214/880a4261/attachment-0001.bin>


More information about the llvm-dev mailing list