[llvm-dev] [cfe-dev] [RFC] ASM Goto With Output Constraints

Fri Jun 28 19:20:24 PDT 2019

On Fri, Jun 28, 2019 at 5:39 PM Finkel, Hal J. <hfinkel at anl.gov> wrote:

> On 6/28/19 5:35 PM, James Y Knight via llvm-dev wrote:
>
> On Fri, Jun 28, 2019 at 5:53 PM Bill Wendling <isanbard at gmail.com> wrote:
>
>> On Fri, Jun 28, 2019 at 1:48 PM James Y Knight <jyknight at google.com>
>> wrote:
>>
>>> On Fri, Jun 28, 2019 at 3:00 PM Bill Wendling <isanbard at gmail.com>
>>> wrote:
>>>
>>>> On Thu, Jun 27, 2019 at 1:44 PM Bill Wendling <isanbard at gmail.com>
>>>> wrote:
>>>>
>>>>> On Thu, Jun 27, 2019 at 1:29 PM James Y Knight <jyknight at google.com>
>>>>> wrote:
>>>>>
>>>>>> I think this is fine, except that it stops at the point where things
>>>>>> actually start to get interesting and tricky.
>>>>>>
>>>>>> How will you actually handle the flow of values from the callbr into
>>>>>> the error blocks? A callbr can specify requirements on where its outputs
>>>>>> live. So, what if two callbr, in different branches of code, specify
>>>>>> _different_ constraints for the same output, and list the same block as a
>>>>>> possible error successor? How can the resulting phi be codegened?
>>>>>>
>>>>>> This is where I fall back on the statement about how "the programmer
>>>>> knows what they're doing". Perhaps I'm being too cavalier here? My concern,
>>>>> if you want to call it that, is that we don't be too restrictive on the new
>>>>> behavior. For example, the "asm goto" may set a register to an error value
>>>>> (made up on the spot; may not be a common use). But, if there's no real
>>>>> reason to have the value be valid on the abnormal path, then sure we can
>>>>> declare that it's not valid on the abnormal path.
>>>>>
>>>>> I think I should explain my "programmer knows what they're doing"
>>>> statement a bit better. I'm specifically referring to inline asm here. The
>>>> more general "callbr" case may still need to be considered (see Reid's
>>>> reply).
>>>>
>>>> When a programmer uses inline asm, they're implicitly telling the
>>>> compiler that they *do* know what they're doing  (I know this is common
>>>> knowledge, but I wanted to reiterate it.). In particular, either they need
>>>> to reference an instruction not readily available from the compiler (e.g.
>>>> "cpuid") or the compiler isn't able to give them the needed performance in
>>>> a critical section. I'm extending this sentiment to callbr with output
>>>> constraints. Let's take your example below and write it as "normal" asm
>>>> statements one on each branch of an if-then-else (please ignore any syntax
>>>> errors):
>>>>
>>>> if:
>>>>   br i1 %cmp, label %true, label %false
>>>>
>>>> true:
>>>>   %0 = call { i32, i32 } asm sideeffect "poetry $0, $1", "={r8},={r9}"
>>>> ()
>>>>   br label %end
>>>>
>>>> false:
>>>>   %1 = call { i32, i32 } asm sideeffect "poetry2 $0, $1",
>>>> "={r10},={r11}" ()
>>>>   br label %end
>>>>
>>>> end:
>>>>   %vals = phi { i32, i32 } [ %0, %true ], [ %1, %false ]
>>>>
>>>> How is this handled in codegen? Is it an error or does the back-end
>>>> handle it? Whatever's done today for "normal" inline asm is what I *think*
>>>> should be the behavior for the inline asm callbr variant. If this doesn't
>>>> seem sensible (and I realize that I may be thinking of an "in a perfect
>>>> world" scenario), then we'll need to come up with a more sensible solution
>>>> which may be to disallow the values on the error block until we can think
>>>> of a better way to handle them.
>>>>
>>>
>>> This example is no problem, because instructions can be emitted between
>>> what's emitted by "call asm" and the end of the block (be it a fallthrough,
>>> or a jump instruction. What gets emitted there is a move of the output
>>> register to another location -- either a register or to the stack. And
>>> therefore at the beginning of the "end" block, "%vals" is always in a
>>> consistent location, no matter how you got to that block.
>>>
>>> But in the callbr case, there is not a location at which those moves can
>>> be emitted, after the callbr, before the jump to "error".
>>>
>>
>> I see what you mean. Let's say we create a pseudo-instruction (similar to
>> landingpad, et al) that needs to be lowered by the backend in a reasonable
>> manner. The EH stuff has an external process/library that performs the
>> actual unwinding and which sets the values accordingly. We won't have this.
>>
>
>
>
>> What we could do instead is split the edges and insert the copy-to-<where
>> ever> statements there.
>>
>
> Exactly -- except that doing that is potentially an invalid transform,
> because the address is being used as a value, not simply a jump target. The
> label list is just a list of _possible_ jump targets, changing those won't
> actually affect anything. You'd instead need to change the blockaddress
> constant, but in the general case you don't know where that address came
> from -- (and it may therefore be required that you have the same address
> for two separate callbr instructions).
>
> I guess this kinda touches on some of the same issues as in the other
> discussion about the handling of the blockaddress in callbr and inlining,
> etc...
>
> I wonder if we could put some validity restrictions on the IR structure,
> rather than trying to fix things up after the fact by attempting to split
> blocks. E.g., we could state that it's invalid to have a phi which uses the
> value defined by a callbr, if it's conditioned on that same block as
> predecessor.  That is: it's valid to use _other_ values defined in the
> block ending in callbr, because they can be moved prior to the callbr. It's
> also valid to use the value defined by the callbr in a phi conditioned on
> some other intermediate block as predecessor, because then any required
> moves can happen in the intermediate block.
>
> I believe such an IR restriction should be sufficient to make it possible
> to emit valid code from the IR in all cases, but I'm a bit afraid of how
> badly adding such odd edge-cases might screw up the rest of the compiler
> and optimizer.
>
> That may be a reasonable restriction to place on the code.

Allow me to wildly speculate a bit. What I would *like* to have happen is
to generate assembly akin to this:

*Lasm.goto.dest:*           ; The original blockaddress destination.

Lasm.goto.dest.bb1:

  mov %..., %...

  jmp Lasm.goto.dest.body

Lasm.goto.dest.bb2:

  mov %..., %...

  jmp Lasm.goto.dest.body ; This would be elided, of course.

Lasm.goto.dest.body:

  ...

This preserves the blockaddress value. If we create a new instruction,
let's say `indirectval' (a horrible name, but used for this example), it
could save us from having to deal with edge splitting. It could take values
similar to a phi node:

  <val> = indirectval <ty> [%v1, label %bb1], [%v2, label %bb2]

where %v1 and %v2 are from callbr instructions. When we are converting the
IR into machine instructions, we can generate something similar to the
example above:

asm.goto.dest:

  BR asm.goto.dest.bb1

asm.goto.dest.bb1:

  MOV ...

  BR asm.goto.dest.body

asm.goto.dest.bb2:

  MOV ...

  BR asm.goto.dest.body

asm.goto.dest.body:

  ...

The one issue is the precise instruction to add to access the values.
Perhaps they could be inserted directly before the indirectval inst.

I think that your fear is justified.
>
> In any case, if we're going to support forming this kind of callbr in
> Clang, then Clang still needs a place to put the stack stores after the
> inline asm in order to represent the output constraints - which are
> specified in terms of source-level variables and those are always in stack
> locations when Clang is generating IR. I think that we can make all of this
> work if we say that the output constraints, and thus the outputs of the
> callbr, dominate only uses on the normal "fallthrough" branch. Then the
> compiler has a single place to put the stores (and, later, a place to put
> register copies, etc.).
>
Hal, Are you saying that values should not be used on indirect branches?

-bw
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190628/8eeb098d/attachment.html>