[llvm-dev] [RFC] Implementing asm-goto support in Clang/LLVM

Thu Oct 25 13:58:01 PDT 2018

There have been quite a few discussions around asm-goto support in Clang and LLVM.
After working with several members of our community, this is a proposal that, in our opinion, strikes a reasonable balance and finally addresses the lack of implementation.

Justification
-----------------

One of the main motivations for inline assembly support in the compiler comes from the need to directly access hardware-specific instructions either not explicitly representable by the higher-level languages or not supported by the compiler in any form. The latter includes the case of early stages of hardware development when having a full compiler/toolchain support is not feasible. Introducing the control flow capabilities of inline assembly into the compiler will reasonably expand the prototyping/experimenting opportunities for the developers.

Having this feature will also allow us to support the software that already uses asm-goto. E.g. Linux Kernel, for which (at least for x86) the asm-goto is a mandatory requirement for the compiler. The other way of looking on that is having the compatibility with GCC.

Current support for inline assembly in LLVM
-----------------

LLVM supports inline assembler (https://llvm.org/docs/LangRef.html#inline-assembler-expressions) expression through the use of a special value - the asm expression.

An example inline assembler expression is:

                i32 (i32) asm "bswap $0", "=r,r"

Inline assembler expressions may only be used as the callee operand of a call or an invoke instruction. Thus, typically we have:

                %X = call i32 asm "bswap $0", "=r,r"(i32 %Y)

Labels in inline-assembly are already supported in LLVM, so the only problem is the control-flow:

The IR instruction for "asm-goto" must be represented via a terminator instruction in the basic block with specially denoted successor parameters, for this reason the "call" instruction is not suitable. "invoke" is a terminator, but a very specific one - there is a "normal" control flow successor and an "exceptional" one. The "exceptional" one is required to be a landing pad. On the contrary, "asm-goto" can support many output labels in addition to fall through one and those labels represent regular code, not landing pads.

Hence, there is a need for introducing a new IR instruction.

The callbr instruction
-----------------

Our proposed approach is to introduce a new IR instruction named callbr with the following syntax:

                callbr <return_type> <callee> (<argtype1> <arg1>, ...) to label %normal or jump [label %transfer1, label %transfer2...]

This syntax indicates that the callee may transfer the control flow to a "normal" successor (generally the fallthrough in the source language code), denoted by the label after the keyword "to", or to any of the "exceptional" successors (which are expected to be normal basic blocks) denoted by labels in the "or jump" list.

The CallBrInst class implementing the instruction is a subclass of CallBase and is used as a terminator.

Support for asm-goto is implemented by using "void asm" as the callee expression:

                callbr void asm sideeffect <flags> "<asm string>", "<constraints>"(<argtype1> <arg1>, ..., i8* blockaddress(<function>, %transfer1), i8* blockaddress(<function>, %transfer2), ...) to label %normal or jump [label %transfer1, label %transfer2...]

For example, the asm-goto call:

                int example1(...) {
                  ...
                  asm goto("testl %0, %0; jne %l1;" :: "r"(cond)::label_true);
                  ...
                label_true:
                  ...
                }

is represented as:

                define i32 @example1(...) {
                  ...
                  callbr void asm sideeffect "testl $0, $0; jne ${1:l}",
                                "r,X,~{dirflag},~{fpsr},~{flags}"(i32 %5,
                                i8* blockaddress(@example1, %label_true))
                                to label %normal or jump [label %label_true]

                normal:
                  ...
                label_true:
                  ...
                }

The labels from the label list of an asm-goto statement are used by the inline asm as data arguments. To avoid errors in asm parsing and CFG recognition, the labels are passed as arguments to the inline asm using additional "X" input constraints and blockaddress statements while also being used directly as elements of the jump list.

Implementing the callbr instruction and asm-goto requires some adaptation of the existing passes:

* All passes that deal with the CFG must consider all potential successors of the callbr instruction to be possible. This means that no passes that simplify the CFG based on any assumptions can work with callbr

* Due to the way successor and predecessor detection works, some CFG simplifications such as trivial block elimination may be blocked if they would result in duplicate successors for the callbr instruction, as such duplicate successors are incorrectly processed in the IR and cannot be removed due to being used by the callee

* The indirectbr expansion pass may destroy blockaddress expressions if the basic blocks they reference are possible successors of an indirectbr. It may have to be reworked to support this new usage of the blockaddress expression

Some other notes on the instruction and asm-goto implementation:

* The special status of the "normal" destination label allows to specifically adjust its transition probability to make it likely to become a fallthrough successor
* While the initial implementation of asm-goto won't allow outputs, the instruction's syntax supports them in principle, so the support for this can be added at a later date
* The general syntax of the callbr instruction allows to use it to implement the control flow tracking for setjmp/longjmp, but that is beyond the scope of this RFC

I would like to kindly ask for your comments and thoughts on that. I will also submit the prototype patch implementing the proposal to phabricator.

And the most important part, we would like to say huge thanks to Chandler Carruth, Reid Kleckner, James Y Knight, Bill Wendling, Eric Christopher, Richard Smith, Matthias Braun, Nick Desaulniers and others who contributed to this RFC - without you it would not be possible.

Alexander Ivchenko, Mikhail Dvoretckii

Links
-----------------

[1] asm-goto feature request tracker (https://bugs.llvm.org/show_bug.cgi?id=9295)

[2] Discussion in llvm community about asm-goto (https://groups.google.com/forum/#!topic/llvm-dev/v9_oGrBeE9s)

[3] Recent discussion in LKML that resurrected the discussion (https://lkml.org/lkml/2018/2/13/1049)

[4] asm-goto was made mandatory for x86 in January of this year: (https://github.com/ClangBuiltLinux/linux/commit/e501ce957a786ecd076ea0cfb10b114e6e4d0f40)

[5] GCC documentation describes their motivating example here:
(https://gcc.gnu.org/onlinedocs/gcc-4.8.4/gcc/Extended-Asm.html)

[6] Linux kernel RFC which discusses the old C way of implementing tracepoints and the performance issues that were noticed. It also states some performance numbers of the old C code vs. the asm goto (https://lwn.net/Articles/350714/)

[7] LTTng (Linux Trace Toolkit Next Generation) presentation talks about using asm-goto feature as a way of optimize static tracepoints (slides 3-4) (https://www.computer.org/cms/ComputingNow/HomePage/2011/0111/rW_SW_UsingTracing.pdf)

[8] A link to the gcc thread introducing this feature (http://gcc.gnu.org/ml/gcc-patches/2009-07/msg01556.htm)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181025/34f56bef/attachment.html>