[PATCH] D45164: [MC] Change AsmParser to leverage Assembler during evaluation

Mon Apr 9 19:37:08 PDT 2018

jyknight added a comment.

In https://reviews.llvm.org/D45164#1059883, @espindola wrote:

> In https://reviews.llvm.org/D45164#1055716, @niravd wrote:
>
> > In https://reviews.llvm.org/D45164#1055062, @espindola wrote:
> >
> > > > As a result the textual output may fail where the equivalent object generation would pass.
> > >
> > > I don't think that is OK.
> >
> >
> > It's certainly not ideal but this is at least a somewhat reasonable intermediate point until the follow up patch is finished. The divergence between object and text only happens with preprocessor directives in assembly which should mostly happen with .S files which are probably being assembled directly to object.
> >
> > The follow up patch to requires merging the various ObjectStreamer and AsmStreamer paths and is rather large.
>
>
> That would cause us to compute offsets when producing a .s file, right? If we must process .if directives instead of printing them, I don't think we have another option.

This is really tricky too...

If you compute offsets when producing a textual assembly file, except in _very_ limited circumstances where the layout is self-evident and not up to interpretation, you're going to risk getting a different answer than the actual assembler. Consider that X86 has many ways to encode a given instruction, and different assemblers may or may not choose to encode a given textual instruction into the same size output.

For example, llvm used to assemble "movw %cs, (%eax)" as [0x66,0x8c,0x08], instead of [0x8c,0x08]. They mean the same thing, and GNU as has used the short sequence for ages. It would be pretty horrible if you had something like the following input, and at the end of processing through llvm to textual asm, and then GNU as, you ended up with only 2 bytes of output, which shouldn't be possible. (this is a contrived example, yes...)

  foo:
    movw %cs, (%eax)
  .if . - foo == 2
  .byte 0
  .endif

I suppose we might be able to emit a textual asm file that uses only ".byte"/".word"/etc directives...instead of textual instructions. Then we _could_ be certain of the size. (Although that may not actually be feasible when relocations are involved? And in any case, super-ugly and I doubt what any user would want to see...)

>>> Why can't the asm streamer blindly print the entire if to the output?
>> 
>> The 'if' is a preprocessor directive and is only valid in the input (.S preprocessor assembly) and not the output (.s preprocessed assembly).
> 
> Is that a documented restriction?

I don't see a reason why we couldn't emit an ".if" directive into the output. However, I don't see how emitting the original conditions could really be viable, unless you're just passing the entire input textually through to the output without parsing it at all. Consider that _anything_ can go inside an .if/.endif. E.g. defining a macro, or starting a new section. You'd effectively need to fork the entire assembler state upon seeing such a condition, to assemble each path of the conditional separately, and then output both possibilities. And keep forking, on every conditional in the input. The combinatorics of that would be very unfortunate...

One alternative I see for supporting textual output would be to emit a _verification_ check for the value of every layout-dependent absolute expression which was evaluated during the compile. E.g., in the above example, emit something like:

  foo:
    movw %cs, (%eax)
  .Ltmp0:

  /* Verification of layout assumptions: */
  .if .Ltmp0 - foo != 3
  .err
  .endif

However, even given an implementation strategy that seems like it'd probably work, I'm not really sure supporting this for textual output is really that worthwhile?

Nirav: do you know if this comes up in inline asm in any real world project? If not, perhaps this feature could just be disabled when evaluating llvm inline asm expressions, where the ability to emit a .s file is critical.

But for a standalone assembler -- where this sort of use-case occurs rather frequently --  is it really important that you be able to re-emit textual asm?

Repository:
  rL LLVM

https://reviews.llvm.org/D45164