[LLVMdev] Bug in Language Reference? %0 versus %1 as starting index.

Sean Silva chisophugis at gmail.com
Tue Nov 26 18:49:23 PST 2013


(gah, this turned into a huge digression, sorry)

The implicit numbering of BB's seems to be a pretty frequent issue for
people. Surprisingly, the issue boils down to simply changing the IR asm
(.ll file) syntax so that it can have "unnamed BB's" in a recognizable way
that fits in with how unnamed values work (the asmprinter makes an effort
to print a comment with the BB number, but the connection is hard to see
and it's confusing).

The thing that makes this not-as-easy-as-it-looks is doing it in a way that
preserves compatibility with previous IR (and being able to convince
yourself that this is the case), and the fact that the IR-parsing code is a
bit twisty (it's not bad, but the way that some things work is subtly
different from what you would expect) and you have to find something that
"fits well" with what's there, doesn't require major reworking of the
existing code, etc.

An alternative approach is to document very clearly this issue. That might
be good in the short term, but IMO the time would be better spent
ruminating over a way to fit this into the syntax, and thinking
deeply/finding a way to convince yourself and others that this change
doesn't break previous .ll files.

It's just about thinking and coming up with a new syntax that fits well and
that won't break existing .ll files. The key places for making this
round-trip are AssemblyWriter::printBasicBlock in lib/IR/AsmWriter.cpp
and LLParser::ParseBasicBlock in lib/AsmParser/LLParser.cpp. The parsing
side is likely to be entirely in lib/AsmParser/LLLexer.cpp where you need
to find a way to get a new token "LocalLabelID" returned for the new syntax.

To reiterate, the goal of such a change is solely to avoid people getting
confused about the implicit numbering. It needs to be
reminiscent/suggestive of the instruction numbering syntax to avoid this
confusion.

Heck, there may be something within the existing syntax that would work
fine for this, but which we can recognize as being "unnamed", rather than a
unique name e.g. currently $1: will give the BB a name "$1" (in the sense
of getName()), and then "$2:" will give a name "$2", etc., which will cause
a lot of pointless string allocations; recognizing a decimal number here
might be all that's needed (and updating the outputting code accordingly),
although I'm not sure a prefix $ is the best syntax.

Maybe we could even get away with %42: as a BB label and that would be
maximally reminiscent. The way that numbered local variables are handled is
sort of ad-hoc (it is actually also handled in the Lexer; all the parser
sees is lltok::LocalVarID). By just changing LLLexer::LexPercent in
LLLexer.cpp to recognize a local label and emit a "LocalLabelID" token,
then adding an `else if` to the first `if` in LLParser::ParseBasicBlock,
you could probably get a working solution too. However, this introduces an
inconsistency in that now there's this pseudo-common syntax (%[0-9]+) for
unnamed things for both BB's and instructions, but in the case of
instructions, the % sigil is always needed, while the label syntax isn't
sigilized by default, but permits this weird sigilized temporary numbered
form. Maybe that slight inconsistency is worth it? If the inconsistency is
really bothersome, we could also have BB's be able to start sigilized with
% in the other case like instructions are (there is no ambiguity because of
the trailing `:`), but allow the unsigilized versions for compatibility;
this may be more consistent from a semantic perspective too, since we refer
to them sigilized when used as instruction operands.

Or maybe you could have the BB be numbered just like `42:` without the
sigil. We already lex a label like 42:, but we just have the issue that I
mentioned with $1: that we set this string as the getName() value which
creates a bunch of useless strings. If you just change the code to emit a
"LocalLabelID" for this case and imitate how we handle locally numbered
instructions, that could be a pretty clean fix. However, that would change
the behavior for how we handle a label like `0:`, for example, with this
behavior, the following IR asm would work:

define void @foo() {
0:
  %1 = alloca i8*
  ret void
}

but since with our current behavior we handle `0:` as a BB name and set
it's getName() as "0", which causes it to not take up the first unnamed
value slot (the %0'th one), so then you get an error that %1 should be %0.
This may be an annoying forwards-compatibility issue for a while when we
still have to work with not-trunk LLVM's, and this incompatibility may not
be worth it. Actually all the suggestions that I've made so far have this
same issue :/ Actually I think that it is unsolvable without a
forwards-compatibility break due to this (any label that was previously
accepted would not increment the unnamed local counter, which would cause
all the existing unnamed locals to be off by one and cause an error). We do
break forward-compatibility from time to time (e.g. the syntax for the new
attributes system), so it might not be that big of an issue (although
obviously the community will have to decide about the trade-off for a
temporary nuisance vs. the issue this solves). If breaking
forwards-compatibility is OK, then I would strongly suggest the `0:` syntax
or `%0:`.

Hopefully I've given you a bit of the flavor of the issues involved. It's
basically just a problem of sitting down and thinking hard, finding
something cleanly-implementable that doesn't break backwards compatibility,
and checking with the community that the syntax is agreeable and that any
forwards-compatibility break is ok.

-- Sean Silva


On Tue, Nov 26, 2013 at 8:02 PM, Mikael Lyngvig <mikael at lyngvig.org> wrote:

> The language reference states that local temporaries begin with index 0,
> but if I try that on my not-entirely-up-to-date v3.4 llc (it is like a week
> old), I get an error "instruction expected to be numbered '%1'".
>
> Also, quite a few examples in the LR uses %0 as a local identifier.
>
> Should I fix those or is it a problem in llc?
>
>
> -- Mikael
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131126/afa660b9/attachment.html>


More information about the llvm-dev mailing list