[LLVMdev] Bug in Language Reference? %0 versus %1 as starting index.

Sean Silva chisophugis at gmail.com
Tue Nov 26 21:24:51 PST 2013


On Tue, Nov 26, 2013 at 10:35 PM, Mikael Lyngvig <mikael at lyngvig.org> wrote:

> Without ANY intent of offending anybody, I simply don't like C++.  I did
> code in it for some 12 years back from 1990 to 2002, but then I left it
> behind with a feeling of happiness.  The main reason I am _trying_ to make
> a new language is that I hope to one day come up with something that can
> help retiring C++.  I love C# but that language is yet too slow for many
> demanding problem domains.
>

C++ is far from perfect, but it's pretty amazing (especially with C++11);
I'm frequently shocked by how cleanly things can be implemented. If you
haven't been in contact with the language for a decade+, you may want to
give it another shot. Also, the feedback from every code patch you get
reviewed will move you exponentially closer towards an up-to-date working
knowledge of the language.

It sounds like a large portion of your time with C++ was spent in the
pre-STL days. Personally, grokking the STL is the single biggest thing that
ever happened to me as a programmer, and is the reason that I stick with
C++ (AFAIK there is no other mainstream language that can even model the
STL). If you never had the chance to grok the STL, then I would say
*definitely* give C++ another shot.

-- Sean Silva


>
> That being said, I don't seriously believe I'll ever finish up my own
> language, but as long as I am having a good time along the way, I don't
> mind.  Now I spend the majority of my spare time on LLVM documentation
> (most of it still pending submission because of various factors).  Once the
> dust settles from all the documentation projects I've started on (Arch
> Linux build doc, Debian build doc, Windows build doc, Mapping High-Level
> Constructs to LLVM IR), I plan to resume work on my own language, which
> will be something like Python-syntax C# without .NET and perhaps with
> optional garbage collection.
>
> Perhaps I'll some day gather up the courage to pick an easy bug report and
> fix that, but it is not very likely that I ever become a core coder on LLVM.
>
>
> -- Mikael
>
>
> 2013/11/27 Sean Silva <chisophugis at gmail.com>
>
>>
>>
>>
>> On Tue, Nov 26, 2013 at 9:58 PM, Mikael Lyngvig <mikael at lyngvig.org>wrote:
>>
>>> Thanks for the lecture :)  But I was not planning on changing a single
>>> line in LLVM/Clang.  I stick to the documentation until I've learned to
>>> swim, perhaps even forever.  Ah, now I see.  You thought I meant "should I
>>> modify the code to do this or that."  I only meant to change the
>>> documentation.  Please refer to the patch I've sent on LLVM-commits.
>>>  That's about what I had in mind.  I am fully aware that you cannot simply
>>> dive in and hack away on the handling of the %0 temporary.  I wouldn't ever
>>> dream of doing that!
>>>
>>
>> You should dream of doing that. Nobody else has stepped up to do it. Hack
>> on the code; ultimately that's where the action is and where you will gain
>> understanding.
>> (And I'm probably the worst person to give this advice since I do so
>> little code hacking during the school year. I swear, I really do prefer
>> coding; when I'm at work with a nice fast machine it's a lot nicer to hack,
>> but at school with a crappy machine, the situation usually only permits
>> reviewing patches on the mailing lists or docs changes.)
>>
>> AFAIK nobody is an "expert" in that code (its probably long out of core
>> for even the people that wrote it); if you dive into it, you can become a
>> local expert in it.
>>
>>  -- Sean Silva
>>
>>
>>>
>>>
>>> -- Mikael
>>>
>>>
>>>
>>>
>>> 2013/11/27 Sean Silva <chisophugis at gmail.com>
>>>
>>>> (gah, this turned into a huge digression, sorry)
>>>>
>>>> The implicit numbering of BB's seems to be a pretty frequent issue for
>>>> people. Surprisingly, the issue boils down to simply changing the IR asm
>>>> (.ll file) syntax so that it can have "unnamed BB's" in a recognizable way
>>>> that fits in with how unnamed values work (the asmprinter makes an effort
>>>> to print a comment with the BB number, but the connection is hard to see
>>>> and it's confusing).
>>>>
>>>> The thing that makes this not-as-easy-as-it-looks is doing it in a way
>>>> that preserves compatibility with previous IR (and being able to convince
>>>> yourself that this is the case), and the fact that the IR-parsing code is a
>>>> bit twisty (it's not bad, but the way that some things work is subtly
>>>> different from what you would expect) and you have to find something that
>>>> "fits well" with what's there, doesn't require major reworking of the
>>>> existing code, etc.
>>>>
>>>> An alternative approach is to document very clearly this issue. That
>>>> might be good in the short term, but IMO the time would be better spent
>>>> ruminating over a way to fit this into the syntax, and thinking
>>>> deeply/finding a way to convince yourself and others that this change
>>>> doesn't break previous .ll files.
>>>>
>>>> It's just about thinking and coming up with a new syntax that fits well
>>>> and that won't break existing .ll files. The key places for making this
>>>> round-trip are AssemblyWriter::printBasicBlock in lib/IR/AsmWriter.cpp
>>>> and LLParser::ParseBasicBlock in lib/AsmParser/LLParser.cpp. The parsing
>>>> side is likely to be entirely in lib/AsmParser/LLLexer.cpp where you need
>>>> to find a way to get a new token "LocalLabelID" returned for the new syntax.
>>>>
>>>> To reiterate, the goal of such a change is solely to avoid people
>>>> getting confused about the implicit numbering. It needs to be
>>>> reminiscent/suggestive of the instruction numbering syntax to avoid this
>>>> confusion.
>>>>
>>>> Heck, there may be something within the existing syntax that would work
>>>> fine for this, but which we can recognize as being "unnamed", rather than a
>>>> unique name e.g. currently $1: will give the BB a name "$1" (in the sense
>>>> of getName()), and then "$2:" will give a name "$2", etc., which will cause
>>>> a lot of pointless string allocations; recognizing a decimal number here
>>>> might be all that's needed (and updating the outputting code accordingly),
>>>> although I'm not sure a prefix $ is the best syntax.
>>>>
>>>> Maybe we could even get away with %42: as a BB label and that would be
>>>> maximally reminiscent. The way that numbered local variables are handled is
>>>> sort of ad-hoc (it is actually also handled in the Lexer; all the parser
>>>> sees is lltok::LocalVarID). By just changing LLLexer::LexPercent in
>>>> LLLexer.cpp to recognize a local label and emit a "LocalLabelID" token,
>>>> then adding an `else if` to the first `if` in LLParser::ParseBasicBlock,
>>>> you could probably get a working solution too. However, this introduces an
>>>> inconsistency in that now there's this pseudo-common syntax (%[0-9]+) for
>>>> unnamed things for both BB's and instructions, but in the case of
>>>> instructions, the % sigil is always needed, while the label syntax isn't
>>>> sigilized by default, but permits this weird sigilized temporary numbered
>>>> form. Maybe that slight inconsistency is worth it? If the inconsistency is
>>>> really bothersome, we could also have BB's be able to start sigilized with
>>>> % in the other case like instructions are (there is no ambiguity because of
>>>> the trailing `:`), but allow the unsigilized versions for compatibility;
>>>> this may be more consistent from a semantic perspective too, since we refer
>>>> to them sigilized when used as instruction operands.
>>>>
>>>> Or maybe you could have the BB be numbered just like `42:` without the
>>>> sigil. We already lex a label like 42:, but we just have the issue that I
>>>> mentioned with $1: that we set this string as the getName() value which
>>>> creates a bunch of useless strings. If you just change the code to emit a
>>>> "LocalLabelID" for this case and imitate how we handle locally numbered
>>>> instructions, that could be a pretty clean fix. However, that would change
>>>> the behavior for how we handle a label like `0:`, for example, with this
>>>> behavior, the following IR asm would work:
>>>>
>>>> define void @foo() {
>>>> 0:
>>>>   %1 = alloca i8*
>>>>   ret void
>>>> }
>>>>
>>>> but since with our current behavior we handle `0:` as a BB name and set
>>>> it's getName() as "0", which causes it to not take up the first unnamed
>>>> value slot (the %0'th one), so then you get an error that %1 should be %0.
>>>> This may be an annoying forwards-compatibility issue for a while when we
>>>> still have to work with not-trunk LLVM's, and this incompatibility may not
>>>> be worth it. Actually all the suggestions that I've made so far have this
>>>> same issue :/ Actually I think that it is unsolvable without a
>>>> forwards-compatibility break due to this (any label that was previously
>>>> accepted would not increment the unnamed local counter, which would cause
>>>> all the existing unnamed locals to be off by one and cause an error). We do
>>>> break forward-compatibility from time to time (e.g. the syntax for the new
>>>> attributes system), so it might not be that big of an issue (although
>>>> obviously the community will have to decide about the trade-off for a
>>>> temporary nuisance vs. the issue this solves). If breaking
>>>> forwards-compatibility is OK, then I would strongly suggest the `0:` syntax
>>>> or `%0:`.
>>>>
>>>> Hopefully I've given you a bit of the flavor of the issues involved.
>>>> It's basically just a problem of sitting down and thinking hard, finding
>>>> something cleanly-implementable that doesn't break backwards compatibility,
>>>> and checking with the community that the syntax is agreeable and that any
>>>> forwards-compatibility break is ok.
>>>>
>>>> -- Sean Silva
>>>>
>>>>
>>>> On Tue, Nov 26, 2013 at 8:02 PM, Mikael Lyngvig <mikael at lyngvig.org>wrote:
>>>>
>>>>> The language reference states that local temporaries begin with index
>>>>> 0, but if I try that on my not-entirely-up-to-date v3.4 llc (it is like a
>>>>> week old), I get an error "instruction expected to be numbered '%1'".
>>>>>
>>>>> Also, quite a few examples in the LR uses %0 as a local identifier.
>>>>>
>>>>> Should I fix those or is it a problem in llc?
>>>>>
>>>>>
>>>>> -- Mikael
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131127/5d7e7f48/attachment.html>


More information about the llvm-dev mailing list