[LLVMdev] Inconsistencies or intended behaviour of LLVM IR?

Robin Eklind carl.eklind at myport.ac.uk
Tue Jan 27 17:49:15 PST 2015


Hello everyone!

I've recently had a chance to familiarize myself with the nitty-gritty 
details of LLVM IR. It has been a great learning experience, sometimes 
frustrating or confusing but mostly rewarding.

There are a few cases I've come across which seems odd to me. I've tried 
to cross reference with the language specification and the source code 
to the best of my abilities, but would like to reach out to an 
experienced crowd with a few questions.

Could you help me out by taking a look at these examples? To my novice 
eyes they seem to highlight inconsistencies in LLVM IR (or the reference 
implementation), but it is quite likely that I've overlooked something. 
Please help me out.

Note: the example source files have been attached and a copy is made 
available at https://github.com/mewplay/ll

* Item 1 - named pointer types

It is possible to create a named array pointer type (and many others), 
but not a named structure pointer type. E.g.

%x = type [1 x i32]* ; valid.
%x = type {i32}*     ; invalid.

Is this the intended behaviour? Attaching a.ll, b.ll, c.ll and d.ll for 
reference. All files except d.ll compiles without error using clang 
version 3.5.1 (tags/RELEASE_351/final).

 > $ clang d.ll
 > d.ll:3:16: error: expected top-level entity
 > %x = type {i32}*
 >                ^
 > 1 error generated.

Does it have anything to do with type equality? (just a hunch)

* Item 2 - equality of named types

A named integer type is equivalent to its literal type counterpart, but 
the same is not true for named and literal structures. I am certain that 
I've read about this before, but can't seem to locate the right section 
of the language specification; could anyone point me in the right 
direction? Also, what is the motivation behind this decision? I've 
skimmed over the code which handles named structure types (in 
lib/IR/core.cpp), but would love to hear the high level idea.

Attaching e.ll, f.ll, g.ll and h.ll for reference. All compile just file 
except h.ll, which produces the following error message (using the same 
version of clang as above):

 > $ clang h.ll
 > h.ll:10:23: error: argument is not of expected type '%x = type { i32 }'
 >         call void (%x)* @foo({i32} {i32 0})
 >                              ^
 > 1 error generated.

* Item 3 - zero initialized common linkage variables

According to the language specification common linkage variables are 
required to have a zero initializer [1]. If so, why are they also 
required to provide an initial value?

Attaching i.ll and j.ll for reference. Both compiles just fine and once 
executed i.ll returns 37 and j.ll return 0. If the common linkage 
variable @x was not initialized to 0, j.ll would have returned 42.

* Item 4 - constant common linkage variables

The language specification states that common linkage variables may not 
be marked as constant [1]. The parser doesn't seem to enforce this 
restriction. Would doing so cause any problems?

Attaching k.ll and l.ll for reference. Both compiles just fine, but once 
executed k.ll returns 37 (e.g. the constant variable was overwritten) 
while l.ll segfaults as expected when it tries to overwrite a read-only 
memory location.

* Item 5 - appending linkage restrictions

An extract from the language specification [1]:

 > "appending" linkage may only be applied to global variables of 
pointer to array type.

Similarly to item 4 this restriction isn't enforced by the parser. Would 
it make sense doing so, or is there any problem with such an approach?

* Item 6 - hash token

The hash token (#) is defined in lib/AsmParser/LLToken.h (release 
version 3.5.0 of the LLVM source code) but doesn't seem to be used 
anywhere else in the source tree. Is this token a historical artefact or 
does it serve a purpose?

* Item 7 - backslash token

Similarly to item 7 the backslash token doesn't seem to serve a purpose 
(with regards to release version 3.5.0 of the LLVM source code). Is it 
used somewhere?

* Item 8 - quoted labels

A comment in lib/AsmParser/LLLexer.cpp (once again, release version 
3.5.0 of the LLVM source code) describes quoted labels using the 
following regexp (e.g. at least one character between the double quotes):

 > ///   QuoteLabel        "[^"]+":

In contrast the reference implementation accepts quoted labels with zero 
or more characters between the double quotes. Which is to be trusted? 
The comment makes more sense as the variable name would effectively be 
blank otherwise.

* Item 9 - undocumented calling conventions

The following calling conventions are valid tokens but not described in 
the language references as of revision 223189:

intel_ocl_bicc, x86_stdcallcc, x86_fastcallcc, x86_thiscallcc, 
kw_x86_vectorcallcc, arm_apcscc, arm_aapcscc, arm_aapcs_vfpcc, 
msp430_intrcc, ptx_kernel, ptx_device, spir_kernel, spir_func, 
x86_64_sysvcc, x86_64_win64cc, kw_ghccc



Lastly I'd just like to thank the LLVM developers for all the time and 
hard work they've put into this project. I'd especially like to thank 
you for providing a language specification along side of the reference 
implementation! Keeping it up to date is a huge task, but also hugely 
important. Thank you!

Kind regards
/Robin Eklind

[1]: http://llvm.org/docs/LangRef.html#linkage-types
-------------- next part --------------
target triple = "x86_64-unknown-linux-gnu"

define void @foo([1 x i32]*) {
	ret void
}

define i32 @main() {
	call void ([1 x i32]*)* @foo([1 x i32]* null)
	ret i32 0
}
-------------- next part --------------
target triple = "x86_64-unknown-linux-gnu"

%x = type [1 x i32]*

define void @foo(%x) {
	ret void
}

define i32 @main() {
	call void (%x)* @foo(%x null)
	ret i32 0
}
-------------- next part --------------
target triple = "x86_64-unknown-linux-gnu"

define void @foo({i32}*) {
	ret void
}

define i32 @main() {
	call void ({i32}*)* @foo({i32}* null)
	ret i32 0
}
-------------- next part --------------
target triple = "x86_64-unknown-linux-gnu"

%x = type {i32}*

define void @foo(%x) {
	ret void
}

define i32 @main() {
	call void (%x)* @foo(%x null)
	ret i32 0
}
-------------- next part --------------
target triple = "x86_64-unknown-linux-gnu"

%x = type i32

define void @foo(%x) {
	ret void
}

define i32 @main() {
	call void (%x)* @foo(%x 0)
	ret i32 0
}
-------------- next part --------------
target triple = "x86_64-unknown-linux-gnu"

%x = type i32

define void @foo(%x) {
	ret void
}

define i32 @main() {
	call void (%x)* @foo(i32 0)
	ret i32 0
}
-------------- next part --------------
target triple = "x86_64-unknown-linux-gnu"

%x = type {i32}

define void @foo(%x) {
	ret void
}

define i32 @main() {
	call void (%x)* @foo(%x {i32 0})
	ret i32 0
}
-------------- next part --------------
target triple = "x86_64-unknown-linux-gnu"

%x = type {i32}

define void @foo(%x) {
	ret void
}

define i32 @main() {
	call void (%x)* @foo({i32} {i32 0})
	ret i32 0
}
-------------- next part --------------
target triple = "x86_64-unknown-linux-gnu"

@x = common global i32 42

define i32 @main() {
	store i32 37, i32* @x
	%foo = load i32* @x
	ret i32 %foo
}
-------------- next part --------------
target triple = "x86_64-unknown-linux-gnu"

@x = common global i32 42

define i32 @main() {
	%foo = load i32* @x
	ret i32 %foo
}
-------------- next part --------------
target triple = "x86_64-unknown-linux-gnu"

@x = common constant i32 42

define i32 @main() {
	store i32 37, i32* @x
	%foo = load i32* @x
	ret i32 %foo
}
-------------- next part --------------
target triple = "x86_64-unknown-linux-gnu"

@x = constant i32 42

define i32 @main() {
	store i32 37, i32* @x
	%foo = load i32* @x
	ret i32 %foo
}
-------------- next part --------------
target triple = "x86_64-unknown-linux-gnu"

@x = appending global i32 2

define i32 @main() {
	%foo = load i32* @x
	ret i32 %foo
}


More information about the llvm-dev mailing list