[LLVMdev] Bug in Language Reference? %0 versus %1 as starting index.

Tue Nov 26 18:58:55 PST 2013

Thanks for the lecture :)  But I was not planning on changing a single line
in LLVM/Clang.  I stick to the documentation until I've learned to swim,
perhaps even forever.  Ah, now I see.  You thought I meant "should I modify
the code to do this or that."  I only meant to change the documentation.
 Please refer to the patch I've sent on LLVM-commits.  That's about what I
had in mind.  I am fully aware that you cannot simply dive in and hack away
on the handling of the %0 temporary.  I wouldn't ever dream of doing that!

-- Mikael

2013/11/27 Sean Silva <chisophugis at gmail.com>

> (gah, this turned into a huge digression, sorry)
>
> The implicit numbering of BB's seems to be a pretty frequent issue for
> people. Surprisingly, the issue boils down to simply changing the IR asm
> (.ll file) syntax so that it can have "unnamed BB's" in a recognizable way
> that fits in with how unnamed values work (the asmprinter makes an effort
> to print a comment with the BB number, but the connection is hard to see
> and it's confusing).
>
> The thing that makes this not-as-easy-as-it-looks is doing it in a way
> that preserves compatibility with previous IR (and being able to convince
> yourself that this is the case), and the fact that the IR-parsing code is a
> bit twisty (it's not bad, but the way that some things work is subtly
> different from what you would expect) and you have to find something that
> "fits well" with what's there, doesn't require major reworking of the
> existing code, etc.
>
> An alternative approach is to document very clearly this issue. That might
> be good in the short term, but IMO the time would be better spent
> ruminating over a way to fit this into the syntax, and thinking
> deeply/finding a way to convince yourself and others that this change
> doesn't break previous .ll files.
>
> It's just about thinking and coming up with a new syntax that fits well
> and that won't break existing .ll files. The key places for making this
> round-trip are AssemblyWriter::printBasicBlock in lib/IR/AsmWriter.cpp
> and LLParser::ParseBasicBlock in lib/AsmParser/LLParser.cpp. The parsing
> side is likely to be entirely in lib/AsmParser/LLLexer.cpp where you need
> to find a way to get a new token "LocalLabelID" returned for the new syntax.
>
> To reiterate, the goal of such a change is solely to avoid people getting
> confused about the implicit numbering. It needs to be
> reminiscent/suggestive of the instruction numbering syntax to avoid this
> confusion.
>
> Heck, there may be something within the existing syntax that would work
> fine for this, but which we can recognize as being "unnamed", rather than a
> unique name e.g. currently $1: will give the BB a name "$1" (in the sense
> of getName()), and then "$2:" will give a name "$2", etc., which will cause
> a lot of pointless string allocations; recognizing a decimal number here
> might be all that's needed (and updating the outputting code accordingly),
> although I'm not sure a prefix $ is the best syntax.
>
> Maybe we could even get away with %42: as a BB label and that would be
> maximally reminiscent. The way that numbered local variables are handled is
> sort of ad-hoc (it is actually also handled in the Lexer; all the parser
> sees is lltok::LocalVarID). By just changing LLLexer::LexPercent in
> LLLexer.cpp to recognize a local label and emit a "LocalLabelID" token,
> then adding an `else if` to the first `if` in LLParser::ParseBasicBlock,
> you could probably get a working solution too. However, this introduces an
> inconsistency in that now there's this pseudo-common syntax (%[0-9]+) for
> unnamed things for both BB's and instructions, but in the case of
> instructions, the % sigil is always needed, while the label syntax isn't
> sigilized by default, but permits this weird sigilized temporary numbered
> form. Maybe that slight inconsistency is worth it? If the inconsistency is
> really bothersome, we could also have BB's be able to start sigilized with
> % in the other case like instructions are (there is no ambiguity because of
> the trailing `:`), but allow the unsigilized versions for compatibility;
> this may be more consistent from a semantic perspective too, since we refer
> to them sigilized when used as instruction operands.
>
> Or maybe you could have the BB be numbered just like `42:` without the
> sigil. We already lex a label like 42:, but we just have the issue that I
> mentioned with $1: that we set this string as the getName() value which
> creates a bunch of useless strings. If you just change the code to emit a
> "LocalLabelID" for this case and imitate how we handle locally numbered
> instructions, that could be a pretty clean fix. However, that would change
> the behavior for how we handle a label like `0:`, for example, with this
> behavior, the following IR asm would work:
>
> define void @foo() {
> 0:
>   %1 = alloca i8*
>   ret void
> }
>
> but since with our current behavior we handle `0:` as a BB name and set
> it's getName() as "0", which causes it to not take up the first unnamed
> value slot (the %0'th one), so then you get an error that %1 should be %0.
> This may be an annoying forwards-compatibility issue for a while when we
> still have to work with not-trunk LLVM's, and this incompatibility may not
> be worth it. Actually all the suggestions that I've made so far have this
> same issue :/ Actually I think that it is unsolvable without a
> forwards-compatibility break due to this (any label that was previously
> accepted would not increment the unnamed local counter, which would cause
> all the existing unnamed locals to be off by one and cause an error). We do
> break forward-compatibility from time to time (e.g. the syntax for the new
> attributes system), so it might not be that big of an issue (although
> obviously the community will have to decide about the trade-off for a
> temporary nuisance vs. the issue this solves). If breaking
> forwards-compatibility is OK, then I would strongly suggest the `0:` syntax
> or `%0:`.
>
> Hopefully I've given you a bit of the flavor of the issues involved. It's
> basically just a problem of sitting down and thinking hard, finding
> something cleanly-implementable that doesn't break backwards compatibility,
> and checking with the community that the syntax is agreeable and that any
> forwards-compatibility break is ok.
>
> -- Sean Silva
>
>
> On Tue, Nov 26, 2013 at 8:02 PM, Mikael Lyngvig <mikael at lyngvig.org>wrote:
>
>> The language reference states that local temporaries begin with index 0,
>> but if I try that on my not-entirely-up-to-date v3.4 llc (it is like a week
>> old), I get an error "instruction expected to be numbered '%1'".
>>
>> Also, quite a few examples in the LR uses %0 as a local identifier.
>>
>> Should I fix those or is it a problem in llc?
>>
>>
>> -- Mikael
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131127/e2abe391/attachment.html>