<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On 27 January 2015 at 17:49, Robin Eklind <span dir="ltr"><<a href="mailto:carl.eklind@myport.ac.uk" target="_blank">carl.eklind@myport.ac.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Hello everyone!<br>
<br>
I've recently had a chance to familiarize myself with the nitty-gritty details of LLVM IR. It has been a great learning experience, sometimes frustrating or confusing but mostly rewarding.<br>
<br>
There are a few cases I've come across which seems odd to me. I've tried to cross reference with the language specification and the source code to the best of my abilities, but would like to reach out to an experienced crowd with a few questions.<br>
<br>
Could you help me out by taking a look at these examples? To my novice eyes they seem to highlight inconsistencies in LLVM IR (or the reference implementation), but it is quite likely that I've overlooked something. Please help me out.<br>
<br>
Note: the example source files have been attached and a copy is made available at <a href="https://github.com/mewplay/ll" target="_blank">https://github.com/mewplay/ll</a><br>
<br>
* Item 1 - named pointer types<br>
<br>
It is possible to create a named array pointer type (and many others), but not a named structure pointer type. E.g.<br>
<br>
%x = type [1 x i32]* ; valid.<br>
%x = type {i32}* ; invalid.<br>
<br>
Is this the intended behaviour? Attaching a.ll, b.ll, c.ll and d.ll for reference. All files except d.ll compiles without error using clang version 3.5.1 (tags/RELEASE_351/final).<br></blockquote><div><br></div><div>Only struct types may be named. What you're seeing is an artifact of the .ll parser compatibility-supporting the old (llvm 2.x) syntax. In the array case, the resulting llvm::Module does not have any type named %x. In the struct case, it's a hard error as you noticed. LLVM 2.x used to permit all types to have names.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">> $ clang d.ll<br>
> d.ll:3:16: error: expected top-level entity<br>
> %x = type {i32}*<br>
> ^<br>
> 1 error generated.<br>
<br>
Does it have anything to do with type equality? (just a hunch)<br>
<br>
* Item 2 - equality of named types<br>
<br>
A named integer type is equivalent to its literal type counterpart, but the same is not true for named and literal structures.</blockquote><div><br></div><div>Right. Since named non-struct types don't exist, what's really going on is that the .ll parser remembers %name to Type* mapping and uses that all over. Hence they're pointer equivalent. For structs, this is not so, structs with identical contents but different names are different.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> I am certain that I've read about this before, but can't seem to locate the right section of the language specification; could anyone point me in the right direction? Also, what is the motivation behind this decision? I've skimmed over the code which handles named structure types (in lib/IR/core.cpp), but would love to hear the high level idea.<br>
<br>
Attaching e.ll, f.ll, g.ll and h.ll for reference. All compile just file except h.ll, which produces the following error message (using the same version of clang as above):<br>
<br>
> $ clang h.ll<br>
> h.ll:10:23: error: argument is not of expected type '%x = type { i32 }'<br>
> call void (%x)* @foo({i32} {i32 0})<br>
> ^<br>
> 1 error generated.<br>
<br>
* Item 3 - zero initialized common linkage variables<br>
<br>
According to the language specification common linkage variables are required to have a zero initializer [1]. If so, why are they also required to provide an initial value?<br></blockquote><div><br></div><div>I don't know but I can guess. We want code that checks for an initial value (via the C++ API) to only look in one place, GV->getInitializer(), instead of adding a check for isCommon() at each call site.</div><div><br></div><div>Of course we could make the .ll text for this whatever we want, but having a zero initializer requirement more closely matches what's going on with the objects in memory.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Attaching i.ll and j.ll for reference. Both compiles just fine and once executed i.ll returns 37 and j.ll return 0. If the common linkage variable @x was not initialized to 0, j.ll would have returned 42.<br>
<br>
* Item 4 - constant common linkage variables<br>
<br>
The language specification states that common linkage variables may not be marked as constant [1]. The parser doesn't seem to enforce this restriction. Would doing so cause any problems?<br></blockquote><div><br></div><div>In general, restrictions are enforced by the verifier, not the .ll parser. The verifier operates on the in-memory model and is the source of truth for validity of IR.</div><div><br></div><div><div>$ cat a.ll</div><div>@x = common global i32 1</div></div><div><div>$ llvm-as a.ll</div><div>llvm-as: assembly parsed, but does not verify as correct!</div><div>'common' global must have a zero initializer!</div><div>i32* @x</div></div><div><br></div><div>All passes are expected to assume that their inputs pass the verifier, and are permitted to executed undefined behaviour if they do not. All passes are expected to leave the IR in a state where the verifier passes (on the assumption that the input did). Same with bitcode reader and writer. There are some utility functions that are used during the execution of a pass which cannot make this assumption since the IR may be invalid during a larger transformation.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Attaching k.ll and l.ll for reference. Both compiles just fine, but once executed k.ll returns 37 (e.g. the constant variable was overwritten) while l.ll segfaults as expected when it tries to overwrite a read-only memory location.<br>
<br>
* Item 5 - appending linkage restrictions<br>
<br>
An extract from the language specification [1]:<br>
<br>
> "appending" linkage may only be applied to global variables of pointer to array type.<br>
<br>
Similarly to item 4 this restriction isn't enforced by the parser. Would it make sense doing so, or is there any problem with such an approach?<br></blockquote><div><br></div><div>Same as above, it's in the verifier.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">* Item 6 - hash token<br>
<br>
The hash token (#) is defined in lib/AsmParser/LLToken.h (release version 3.5.0 of the LLVM source code) but doesn't seem to be used anywhere else in the source tree. Is this token a historical artefact or does it serve a purpose?<br></blockquote><div><br></div><div>It's gone! This was removed in r227442.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">* Item 7 - backslash token<br>
<br>
Similarly to item 7 the backslash token doesn't seem to serve a purpose (with regards to release version 3.5.0 of the LLVM source code). Is it used somewhere?<br></blockquote><div><br></div><div>Yep, again.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">* Item 8 - quoted labels<br>
<br>
A comment in lib/AsmParser/LLLexer.cpp (once again, release version 3.5.0 of the LLVM source code) describes quoted labels using the following regexp (e.g. at least one character between the double quotes):<br>
<br>
> /// QuoteLabel "[^"]+":<br>
<br>
In contrast the reference implementation accepts quoted labels with zero or more characters between the double quotes. Which is to be trusted? The comment makes more sense as the variable name would effectively be blank otherwise.<br></blockquote><div><br></div><div>I think this is a bug. Well, two bugs:</div><div><br></div><div><div>$ cat a.ll</div><div>@"" = internal constant i32 0</div><div>@0 = internal constant i32 0</div></div><div><div>$ llvm-as a.ll</div><div>llvm-as: a.ll:2:1: error: variable expected to be numbered '%1'</div><div>@0 = internal constant i32 0</div><div>^</div></div><div><br></div><div>Anonymous values are numbered, one set of numberings for local variables (including arguments) and one for globals. I think that @"" should not be anonymous, but llvm-as clearly thinks it is. If you check llvm::Value's getValueName() method, we clearly support a distinction between an empty string and no string.</div><div><br></div><div>The other bug is in the error message. The variable should be numbered '@1' not '%1'.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">* Item 9 - undocumented calling conventions<br>
<br>
The following calling conventions are valid tokens but not described in the language references as of revision 223189:<br>
<br>
intel_ocl_bicc, x86_stdcallcc, x86_fastcallcc, x86_thiscallcc, kw_x86_vectorcallcc, arm_apcscc, arm_aapcscc, arm_aapcs_vfpcc, msp430_intrcc, ptx_kernel, ptx_device, spir_kernel, spir_func, x86_64_sysvcc, x86_64_win64cc, kw_ghccc<br></blockquote><div><br></div><div>Ooh. Yes, these should be documented!</div><div><br></div><div>Nick</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Lastly I'd just like to thank the LLVM developers for all the time and hard work they've put into this project. I'd especially like to thank you for providing a language specification along side of the reference implementation! Keeping it up to date is a huge task, but also hugely important. Thank you!<br></blockquote><div><br></div></div></div></div>