[llvm-dev] [cfe-dev] [RFC] ASM Goto With Output Constraints

Bill Wendling via llvm-dev llvm-dev at lists.llvm.org
Fri Jun 28 19:20:24 PDT 2019

On Fri, Jun 28, 2019 at 5:39 PM Finkel, Hal J. <hfinkel at anl.gov> wrote:

> On 6/28/19 5:35 PM, James Y Knight via llvm-dev wrote:
> On Fri, Jun 28, 2019 at 5:53 PM Bill Wendling <isanbard at gmail.com> wrote:
>> On Fri, Jun 28, 2019 at 1:48 PM James Y Knight <jyknight at google.com>
>> wrote:
>>> On Fri, Jun 28, 2019 at 3:00 PM Bill Wendling <isanbard at gmail.com>
>>> wrote:
>>>> On Thu, Jun 27, 2019 at 1:44 PM Bill Wendling <isanbard at gmail.com>
>>>> wrote:
>>>>> On Thu, Jun 27, 2019 at 1:29 PM James Y Knight <jyknight at google.com>
>>>>> wrote:
>>>>>> I think this is fine, except that it stops at the point where things
>>>>>> actually start to get interesting and tricky.
>>>>>> How will you actually handle the flow of values from the callbr into
>>>>>> the error blocks? A callbr can specify requirements on where its outputs
>>>>>> live. So, what if two callbr, in different branches of code, specify
>>>>>> _different_ constraints for the same output, and list the same block as a
>>>>>> possible error successor? How can the resulting phi be codegened?
>>>>>> This is where I fall back on the statement about how "the programmer
>>>>> knows what they're doing". Perhaps I'm being too cavalier here? My concern,
>>>>> if you want to call it that, is that we don't be too restrictive on the new
>>>>> behavior. For example, the "asm goto" may set a register to an error value
>>>>> (made up on the spot; may not be a common use). But, if there's no real
>>>>> reason to have the value be valid on the abnormal path, then sure we can
>>>>> declare that it's not valid on the abnormal path.
>>>>> I think I should explain my "programmer knows what they're doing"
>>>> statement a bit better. I'm specifically referring to inline asm here. The
>>>> more general "callbr" case may still need to be considered (see Reid's
>>>> reply).
>>>> When a programmer uses inline asm, they're implicitly telling the
>>>> compiler that they *do* know what they're doing  (I know this is common
>>>> knowledge, but I wanted to reiterate it.). In particular, either they need
>>>> to reference an instruction not readily available from the compiler (e.g.
>>>> "cpuid") or the compiler isn't able to give them the needed performance in
>>>> a critical section. I'm extending this sentiment to callbr with output
>>>> constraints. Let's take your example below and write it as "normal" asm
>>>> statements one on each branch of an if-then-else (please ignore any syntax
>>>> errors):
>>>> if:
>>>>   br i1 %cmp, label %true, label %false
>>>> true:
>>>>   %0 = call { i32, i32 } asm sideeffect "poetry $0, $1", "={r8},={r9}"
>>>> ()
>>>>   br label %end
>>>> false:
>>>>   %1 = call { i32, i32 } asm sideeffect "poetry2 $0, $1",
>>>> "={r10},={r11}" ()
>>>>   br label %end
>>>> end:
>>>>   %vals = phi { i32, i32 } [ %0, %true ], [ %1, %false ]
>>>> How is this handled in codegen? Is it an error or does the back-end
>>>> handle it? Whatever's done today for "normal" inline asm is what I *think*
>>>> should be the behavior for the inline asm callbr variant. If this doesn't
>>>> seem sensible (and I realize that I may be thinking of an "in a perfect
>>>> world" scenario), then we'll need to come up with a more sensible solution
>>>> which may be to disallow the values on the error block until we can think
>>>> of a better way to handle them.
>>> This example is no problem, because instructions can be emitted between
>>> what's emitted by "call asm" and the end of the block (be it a fallthrough,
>>> or a jump instruction. What gets emitted there is a move of the output
>>> register to another location -- either a register or to the stack. And
>>> therefore at the beginning of the "end" block, "%vals" is always in a
>>> consistent location, no matter how you got to that block.
>>> But in the callbr case, there is not a location at which those moves can
>>> be emitted, after the callbr, before the jump to "error".
>> I see what you mean. Let's say we create a pseudo-instruction (similar to
>> landingpad, et al) that needs to be lowered by the backend in a reasonable
>> manner. The EH stuff has an external process/library that performs the
>> actual unwinding and which sets the values accordingly. We won't have this.
>> What we could do instead is split the edges and insert the copy-to-<where
>> ever> statements there.
> Exactly -- except that doing that is potentially an invalid transform,
> because the address is being used as a value, not simply a jump target. The
> label list is just a list of _possible_ jump targets, changing those won't
> actually affect anything. You'd instead need to change the blockaddress
> constant, but in the general case you don't know where that address came
> from -- (and it may therefore be required that you have the same address
> for two separate callbr instructions).
> I guess this kinda touches on some of the same issues as in the other
> discussion about the handling of the blockaddress in callbr and inlining,
> etc...
> I wonder if we could put some validity restrictions on the IR structure,
> rather than trying to fix things up after the fact by attempting to split
> blocks. E.g., we could state that it's invalid to have a phi which uses the
> value defined by a callbr, if it's conditioned on that same block as
> predecessor.  That is: it's valid to use _other_ values defined in the
> block ending in callbr, because they can be moved prior to the callbr. It's
> also valid to use the value defined by the callbr in a phi conditioned on
> some other intermediate block as predecessor, because then any required
> moves can happen in the intermediate block.
> I believe such an IR restriction should be sufficient to make it possible
> to emit valid code from the IR in all cases, but I'm a bit afraid of how
> badly adding such odd edge-cases might screw up the rest of the compiler
> and optimizer.
> That may be a reasonable restriction to place on the code.

Allow me to wildly speculate a bit. What I would *like* to have happen is
to generate assembly akin to this:

*Lasm.goto.dest:*           ; The original blockaddress destination.


  mov %..., %...

  jmp Lasm.goto.dest.body


  mov %..., %...

  jmp Lasm.goto.dest.body ; This would be elided, of course.



This preserves the blockaddress value. If we create a new instruction,
let's say `indirectval' (a horrible name, but used for this example), it
could save us from having to deal with edge splitting. It could take values
similar to a phi node:

  <val> = indirectval <ty> [%v1, label %bb1], [%v2, label %bb2]

where %v1 and %v2 are from callbr instructions. When we are converting the
IR into machine instructions, we can generate something similar to the
example above:


  BR asm.goto.dest.bb1


  MOV ...

  BR asm.goto.dest.body


  MOV ...

  BR asm.goto.dest.body



The one issue is the precise instruction to add to access the values.
Perhaps they could be inserted directly before the indirectval inst.

I think that your fear is justified.
> In any case, if we're going to support forming this kind of callbr in
> Clang, then Clang still needs a place to put the stack stores after the
> inline asm in order to represent the output constraints - which are
> specified in terms of source-level variables and those are always in stack
> locations when Clang is generating IR. I think that we can make all of this
> work if we say that the output constraints, and thus the outputs of the
> callbr, dominate only uses on the normal "fallthrough" branch. Then the
> compiler has a single place to put the stores (and, later, a place to put
> register copies, etc.).
Hal, Are you saying that values should not be used on indirect branches?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190628/8eeb098d/attachment.html>

More information about the llvm-dev mailing list