[LLVMdev] Inconsistencies or intended behaviour of LLVM IR?

Sean Silva chisophugis at gmail.com
Wed Jan 28 11:31:29 PST 2015


On Wed, Jan 28, 2015 at 6:28 PM, Robin Eklind <carl.eklind at myport.ac.uk>
wrote:

> Hello Sean,
>
> Thank you for your reply. I'll give your suggestion to item 6 and 7 a try
> tonight. I'll start a compilation and let it run throughout the night. My
> laptop (x61s) is 8 years old by know, so compiling LLVM takes a little time
> :)
>

This is why I did so much documentation work when in college. The docs
build much faster.


>
> Regarding item 8. I don't know if anyone is using "": in the wild so
> fixing the implementation might make sense. If not the documentation (e.g.
> the QuoteLabel comment) should be updated to be in line with the
> implementation.
>

FYI the textual IR doesn't have a compatibility guarantee (we try not to
egregiously change it, but users don't expect .ll to work across versions).


>
> I only included item 9 since I stumbled upon it once cross-referencing the
> source code with the language specification. Bitrot for a project of this
> size is to be expected.
>
> I'm still very interested to hear about the items related to types, e.g.
> item 1 and 2. Is there a good reference which describes how type equality
> works in LLVM IR? If the source code is the reference, could someone with
> the high level knowledge get me up to speed?
>

Off the top of my head maybe
http://blog.llvm.org/2011/11/llvm-30-type-system-rewrite.html


>
> Item 1 still confuses me, so I'd be very happy if someone with more
> insight could clarify if this is the intended behaviour and if so the
> motivation behind it.
>
> As it so happens, I forgot to include item 10 :)
>
> * Item 10 - lli vs. clang output
>
> Using the same source files as before, it seems like lli and clang treats
> common linkage and constant variables differently. The following execution
> demonstrates the return value after executing i.ll, j.ll, k.ll and l.ll
> with lli and clang respectively:
>
> > $ clang i.ll && ./a.out ; echo $?
> > 37
> >
> > $ lli i.ll ; echo $?
> > 37
> >
> >
> > $ clang j.ll && ./a.out ; echo $?
> > 0
> >
> > $ lli j.ll ; echo $?
> > 42
> >
> >
> > $ clang k.ll && ./a.out ; echo $?
> > 37
> >
> > $ lli k.ll ; echo $?
> > 37
> >
> >
> > $ clang l.ll && ./a.out ; echo $?
> > Segmentation fault
> > 139
> >
> > $ lli l.ll ; echo $?
> > 37
>

Some of these linkage combinations and operations have dubious semantics.
Talking briefly with Rafael Espindola over a build, sounds like we should
mostly tighten up the verifier to remove some of these weird cases. For
example, storing to a constant is sort of .... I'm sort of surprised it
works at all.

-- Sean Silva


>
> Looking forward to hear more about type equality, or get a pointer as to
> where I can read up about it.
>
> Cheers /Robin Eklind
>
>
>
> On 01/28/2015 03:45 PM, Sean Silva wrote:
>
>> A couple quick comments inline (didn't touch on all points):
>>
>> On Wed, Jan 28, 2015 at 1:49 AM, Robin Eklind <carl.eklind at myport.ac.uk>
>> wrote:
>>
>>  Hello everyone!
>>>
>>> I've recently had a chance to familiarize myself with the nitty-gritty
>>> details of LLVM IR. It has been a great learning experience, sometimes
>>> frustrating or confusing but mostly rewarding.
>>>
>>> There are a few cases I've come across which seems odd to me. I've tried
>>> to cross reference with the language specification and the source code to
>>> the best of my abilities, but would like to reach out to an experienced
>>> crowd with a few questions.
>>>
>>> Could you help me out by taking a look at these examples? To my novice
>>> eyes they seem to highlight inconsistencies in LLVM IR (or the reference
>>> implementation), but it is quite likely that I've overlooked something.
>>> Please help me out.
>>>
>>> Note: the example source files have been attached and a copy is made
>>> available at https://github.com/mewplay/ll
>>>
>>> * Item 1 - named pointer types
>>>
>>> It is possible to create a named array pointer type (and many others),
>>> but
>>> not a named structure pointer type. E.g.
>>>
>>> %x = type [1 x i32]* ; valid.
>>> %x = type {i32}*     ; invalid.
>>>
>>> Is this the intended behaviour? Attaching a.ll, b.ll, c.ll and d.ll for
>>> reference. All files except d.ll compiles without error using clang
>>> version
>>> 3.5.1 (tags/RELEASE_351/final).
>>>
>>>  $ clang d.ll
>>>> d.ll:3:16: error: expected top-level entity
>>>> %x = type {i32}*
>>>>                 ^
>>>> 1 error generated.
>>>>
>>>
>>> Does it have anything to do with type equality? (just a hunch)
>>>
>>> * Item 2 - equality of named types
>>>
>>> A named integer type is equivalent to its literal type counterpart, but
>>> the same is not true for named and literal structures. I am certain that
>>> I've read about this before, but can't seem to locate the right section
>>> of
>>> the language specification; could anyone point me in the right direction?
>>> Also, what is the motivation behind this decision? I've skimmed over the
>>> code which handles named structure types (in lib/IR/core.cpp), but would
>>> love to hear the high level idea.
>>>
>>> Attaching e.ll, f.ll, g.ll and h.ll for reference. All compile just file
>>> except h.ll, which produces the following error message (using the same
>>> version of clang as above):
>>>
>>>  $ clang h.ll
>>>> h.ll:10:23: error: argument is not of expected type '%x = type { i32 }'
>>>>          call void (%x)* @foo({i32} {i32 0})
>>>>                               ^
>>>> 1 error generated.
>>>>
>>>
>>> * Item 3 - zero initialized common linkage variables
>>>
>>> According to the language specification common linkage variables are
>>> required to have a zero initializer [1]. If so, why are they also
>>> required
>>> to provide an initial value?
>>>
>>> Attaching i.ll and j.ll for reference. Both compiles just fine and once
>>> executed i.ll returns 37 and j.ll return 0. If the common linkage
>>> variable
>>> @x was not initialized to 0, j.ll would have returned 42.
>>>
>>> * Item 4 - constant common linkage variables
>>>
>>> The language specification states that common linkage variables may not
>>> be
>>> marked as constant [1]. The parser doesn't seem to enforce this
>>> restriction. Would doing so cause any problems?
>>>
>>> Attaching k.ll and l.ll for reference. Both compiles just fine, but once
>>> executed k.ll returns 37 (e.g. the constant variable was overwritten)
>>> while
>>> l.ll segfaults as expected when it tries to overwrite a read-only memory
>>> location.
>>>
>>> * Item 5 - appending linkage restrictions
>>>
>>> An extract from the language specification [1]:
>>>
>>>  "appending" linkage may only be applied to global variables of pointer
>>>>
>>> to array type.
>>>
>>> Similarly to item 4 this restriction isn't enforced by the parser. Would
>>> it make sense doing so, or is there any problem with such an approach?
>>>
>>> * Item 6 - hash token
>>>
>>> The hash token (#) is defined in lib/AsmParser/LLToken.h (release version
>>> 3.5.0 of the LLVM source code) but doesn't seem to be used anywhere else
>>> in
>>> the source tree. Is this token a historical artefact or does it serve a
>>> purpose?
>>>
>>>
>> Try deleting it. If the tests pass send a patch. Same for item 7.
>>
>>
>>
>>> * Item 7 - backslash token
>>>
>>> Similarly to item 7 the backslash token doesn't seem to serve a purpose
>>> (with regards to release version 3.5.0 of the LLVM source code). Is it
>>> used
>>> somewhere?
>>>
>>> * Item 8 - quoted labels
>>>
>>> A comment in lib/AsmParser/LLLexer.cpp (once again, release version 3.5.0
>>> of the LLVM source code) describes quoted labels using the following
>>> regexp
>>> (e.g. at least one character between the double quotes):
>>>
>>>  ///   QuoteLabel        "[^"]+":
>>>>
>>>
>>> In contrast the reference implementation accepts quoted labels with zero
>>> or more characters between the double quotes. Which is to be trusted? The
>>> comment makes more sense as the variable name would effectively be blank
>>> otherwise.
>>>
>>>
>> Looks an empty name just results in the thing becoming unnamed. That's
>> sort
>> of confusing, but probably not harmful. Maybe we use an empty name as a
>> sentinel for "unnamed", so it sort of just was an accident of the
>> implementation.
>>
>>
>>
>>> * Item 9 - undocumented calling conventions
>>>
>>> The following calling conventions are valid tokens but not described in
>>> the language references as of revision 223189:
>>>
>>> intel_ocl_bicc, x86_stdcallcc, x86_fastcallcc, x86_thiscallcc,
>>> kw_x86_vectorcallcc, arm_apcscc, arm_aapcscc, arm_aapcs_vfpcc,
>>> msp430_intrcc, ptx_kernel, ptx_device, spir_kernel, spir_func,
>>> x86_64_sysvcc, x86_64_win64cc, kw_ghccc
>>>
>>>
>>>  This is just bitrot.
>>
>> -- Sean Silva
>>
>>
>>
>>>
>>> Lastly I'd just like to thank the LLVM developers for all the time and
>>> hard work they've put into this project. I'd especially like to thank you
>>> for providing a language specification along side of the reference
>>> implementation! Keeping it up to date is a huge task, but also hugely
>>> important. Thank you!
>>>
>>> Kind regards
>>> /Robin Eklind
>>>
>>> [1]: http://llvm.org/docs/LangRef.html#linkage-types
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150128/bdb97a79/attachment.html>


More information about the llvm-dev mailing list