[LLVMdev] [Proposal] Annotated assembly output

Fri Oct 12 10:12:38 PDT 2012

How is the client supposed to make use of this markup information? At
first glance it seems like client code will just devolve into a pile
of regex insanity. Why not use an existing standardized markup, like
XML (not that I'm that fond of XML)?

At a higher level, why not expose an API for iterating over
(potentially annotated) tokens which can be programmatically
inspected. So what you expose to clients is an AnnotatedAsmTok. Given
an AnnotatedAsmTok, they can call "getAnnotation()", or
"getRawText()". A textual representation which can be read into this
form might be useful, but we should provide the parser.

I guess what I think needs a bit more explanation is why you chose to
go the "markup" route, instead of a normal programmatic API. Maybe you
could also include a couple use cases that capture your "vision" for
this functionality, and maybe a tiny bit of sample code doing
something interesting with a very rough initial interface (if it seems
more natural, since you're talking about a C API, you can just assume
bindings and write the example in your scripting language of choice).

-- Sean Silva

On Fri, Oct 12, 2012 at 12:51 PM, Jim Grosbach <grosbach at apple.com> wrote:
> The following is a brief proposal for annotated assembly (and disassembly) output. Kevin Enderby and I have been discussing this a bit and are interested in getting broader feedback from interested folks.
>
>     LLVM Rich Assembly Output
>
> LLVM's (dis)assembly output is currently very raw. Consumers have limited ability to introspect the instructions' textual representation or to reformat for a more user friendly display. A lot of the actual instruction semantics are contained in the MCInstrDesc for the opcode, but that's not sufficient to reference into individual portions of the instruction text. For clients like disassemblers, list file generators, and pretty-printers, more is necessary than the raw instructions and the ability to print them.
>
> The intent is for the vast majority of the new functionality to not require new APIS, but to be in the assembly text itself via markup annotations. The markup is simple enough in syntax to be robust even in the case of version mismatches between consumers and producers. That is, the syntax generally does not carry semantics beyond "this text has an annotation," so consumers can simply ignore annotations they do not understand or do not care about.
>
> ** Instruction Annotations
>
> Annoated assembly display will supply contextual markup to help clients more efficiently implement things like pretty printers. Most markup will be target independent, so clients can effectively provide good display without any target specific knowledge.
>
> Annotated assembly goes through the normal instruction printer, but optionally includes contextual tags on portions of the instruction string. An annotation is any '<' '>' delimited section of text(1).
>
> annotation: '<' tag-name tag-modifier-list ':' annotated-text '>'
> tag-name: identifier
> tag-modifier-list: comma delimited identifier list
>
> The tag name is an identifier which gives the type of the annotation. For the first pass, this will be very simple, with memory references, registers, and immediates having the tag names "mem", "reg", and "imm", respectively.
>
> The tag modifier list is typically additional target-specific context, such as register class.
>
> Clients should accept and ignore any tag names or tag modifiers they do not understand, allowing the annotations to grow in richness without breaking older clients.
>
> For example, a possible annotation of an ARM load of a stack-relative location might be annotated as:
>
>     ldr <reg gpr:r0>, <mem regoffset:[<reg gpr:sp>, <imm:#4>]>
>
>
> 1: For assembly dialects in which '<' and/or '>' are legal tokens, a literal token is escaped by following immediately with a repeat of the character.  For example, a literal '<' character is output as '<<' in an annotated assembly string.
>
>
> ** C API Details
>
> Some intended consumers of this information use the C API, therefore a new C API function for the disassembler will be added to disassemble an instruction with annotations, "LLVMDisasmInstructionAnnotated.".
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev