[llvm-dev] [RFC] Implementing asm-goto support in Clang/LLVM

James Y Knight via llvm-dev llvm-dev at lists.llvm.org
Thu Oct 25 15:05:39 PDT 2018

Awesome! Big +1 for this.

On Thu, Oct 25, 2018 at 4:58 PM Ivchenko, Alexander via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Our proposed approach is to introduce a new IR instruction named callbr
> with the following syntax:
>                 callbr <return_type> <callee> (<argtype1> <arg1>, ...) to
> label %normal or jump [label %transfer1, label %transfer2...]

The invoke is currently defined as:
<result> = invoke [cconv] [ret attrs] [addrspace(<num>)] [<ty>|<fnty>
<fnptrval>(<function args>) [fn attrs]
              [operand bundles] to label <normal label> unwind label
<exception label>

Will callbr support the same attributes (cconv, ret attrs, addrspace, fn
attrs, operand bundles)? Were they just excluded from the above syntax for
brevity, or are they actually unsupported?

> The labels from the label list of an asm-goto statement are used by the
> inline asm as data arguments. To avoid errors in asm parsing and CFG
> recognition, the labels are passed as arguments to the inline asm using
> additional “X” input constraints and blockaddress statements while also
> being used directly as elements of the jump list.

While it may seem weird to specify the label twice, once as an argument to
the call, and once as a successor, I'd also note that doing so increases
the generality of the instruction, since the call may have access to the
blockaddress via some indirect means and not require it to be passed to the
call in order to jump to it.

> Implementing the callbr instruction and asm-goto requires some adaptation
> of the existing passes:

> * All passes that deal with the CFG must consider all potential successors
> of the callbr instruction to be possible. This means that no passes that
> simplify the CFG based on any assumptions can work with callbr
> * Due to the way successor and predecessor detection works, some CFG
> simplifications such as trivial block elimination may be blocked if they
> would result in duplicate successors for the callbr instruction, as such
> duplicate successors are incorrectly processed in the IR and cannot be
> removed due to being used by the callee.

I don't understand what prevents them from being removed? The callee can't
see what's in the successor list (only its blockaddress arguments), so if
there's any duplicate successors after coalescing blocks, what would be
broken by removing them?

* The indirectbr expansion pass may destroy blockaddress expressions if the
> basic blocks they reference are possible successors of an indirectbr. It
> may have to be reworked to support this new usage of the blockaddress
> expression
Yes, the indirectbr pass replaces all blockaddresses with a small-integer
instead of the actual address, on the assumption that the only use of a
blockaddress is in a indirectbr instruction (which itself gets transformed
into a 'switch' to a branch). Note that this pass is only used by
retpoline, but is required for retpoline to work at the moment. I started
making a proof-of-concept for fixing that -- allowing indirectbr to
translate into a jmp to/through the retpoline stub instead; it seemed easy
enough, although I'm not certain if it's right or not. :)

> Some other notes on the instruction and asm-goto implementation:
> * The special status of the “normal” destination label allows to
> specifically adjust its transition probability to make it likely to become
> a fallthrough successor
> * While the initial implementation of asm-goto won’t allow outputs, the
> instruction’s syntax supports them in principle, so the support for this
> can be added at a later date

And, besides just asm-goto, I think (separate, future) work based on this
could also be useful to support intrinsics with multiple successors.

For example, a (strong) atomic cmpxchg primitive for an ll/sc platform can
be most efficiently modeled as an instruction with both an output value and
two successor blocks (for success and failure). If we can extend callbr to
support an output value, it can represent that. (The alternatives at the
moment are to return only oldval (and recompare to expected to get the
success flag), or to return both oldval and a flag (and do an extraneous
conditional branch.)

* The general syntax of the callbr instruction allows to use it to
> implement the control flow tracking for setjmp/longjmp, but that is beyond
> the scope of this RFC
Yes, with this functionality (...and some other more work), it seems like
it may be possible to correctly represent the CFG for setjmp/longjmp. I had
started a discussion on fixing that with a couple folks a while ago, I'll
try to resurrect that as a separate RFC.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181025/fac1a502/attachment.html>

More information about the llvm-dev mailing list