<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Tue, Oct 18, 2016 at 12:46 PM, Eric Christopher <span dir="ltr"><<a href="mailto:echristo@gmail.com" target="_blank">echristo@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">To the right list this time.<div><div class="gmail-m_2552952784852801906h5"><br><br><div class="gmail_quote"><div dir="ltr">On Tue, Oct 18, 2016 at 12:43 PM Eric Christopher <<a href="mailto:echristo@gmail.com" target="_blank">echristo@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr" class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">Hi Peter,<div class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"><br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"></div><div class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">Coming back to his now.</div><div class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"><br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"><div class="gmail_quote gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"></div></div></div><div dir="ltr" class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"><div class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"><div class="gmail_quote gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"><blockquote class="gmail_quote gmail-m_2552952784852801906m_-3594817603051613428gmail_msg" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
IFCC, the previous attempt to teach LLVM to emit jump tables, was removed<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
for complicating how functions are emitted, in particular requiring a<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
subtarget-specific instruction emitter available in subtarget-independent<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
code. However, the form of a jump table entry is generally well known to<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"></blockquote><div class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"><br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"></div></div></div></div><div dir="ltr" class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"><div class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"><div class="gmail_quote gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"><div class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">In general I think we can handle the subtarget specific aspect in the same way that we handle module level inline assembly. Anything at that object file level needs to be generic enough for the STI we create there anyhow and should work for your needs in creating a jump table.</div><div class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"><br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"></div><div class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">How would you create your jump tables if you were able to generate code in this fashion?</div></div></div></div></blockquote></div></div></div></div></blockquote><div><br></div><div>Under that approach we could in principle imagine a new type of GlobalObject that would represent a jump table and that would hold reference to its entries as "operands". The asm printer could then use some target-specific callback to turn those operand references into jump table entries with EmitInstruction and an STI created like the inline asm STI.</div><div><br></div><div>However I'm not sure if this would be the best way of doing things. It would require using more backend machinery than strictly necessary, and for other reasons (below).<br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><div class="gmail-m_2552952784852801906h5"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr" class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"><div class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"><div class="gmail_quote gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"><div class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"><br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"></div><div class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">Alternately, (though I'm not a huge fan) we could create them using inline assembly as a workaround to get this aspect of your code moving forward.</div></div></div></div></blockquote></div></div></div></div></blockquote><div><br></div><div>Agree that if we can't gain consensus here this would be an uncontroversial first step.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><div class="gmail-m_2552952784852801906h5"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr" class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"><div class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"><div class="gmail_quote gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"><div class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"><br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"></div><div class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">I would very much like to avoid doing things like encoding relocation entries into the IR - it seems to be the wrong level to handle that type of target specific information. I worry that it will create issues with the folk that are trying to move us to a level where we can delete the IR at code generation time as well. I've added Jim since I think his team is looking into that. We might want an MIR level ability to encode jump tables/constants.</div></div></div></div></blockquote></div></div></div></div></blockquote><div><br></div><div>I'd argue that target specific information at this level is to a certain extent reasonable because it is not much different to other target/object-specific constructs such as intrinsics, linkage (to a certain extent) and visibility. i.e. your frontend needs to choose an appropriate linkage/visibility for a global so at some level it needs to be aware of the object format.</div><div><br></div><div>Although I am sceptical that emitting global value initialisers via MI would be beneficial, I think that if we do do that, there's already a substantial representational surface area (e.g. the different types of ConstantExpr that already exist) that would need an MI representation. As far as reloc goes it seems like it would be similar to "just another" sort of ConstantExpr that would need an MI representation; I don't really see how it would be less or more difficult to handle than other kinds of ConstantExpr. In fact, I suspect the lowering to MC would be trivial if we implement something like the ".reloc" directive.</div><div><br></div><div>Of course if we went with something like a new type of GlobalObject we would need an MI-level modelling for that as well. If we wanted to fix PR17633 or do something else that would otherwise require reloc, we'd need some separate way of modelling that. It just seems like more code and more burden overall.</div><div><br></div><div>Peter</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><div class="gmail-m_2552952784852801906h5"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr" class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"><div class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"><div class="gmail_quote gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"><div class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"><br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"></div><div class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">Thoughts?</div></div></div></div><div dir="ltr" class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"><div class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"><div class="gmail_quote gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"><div class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"><br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"></div><div class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">-eric</div></div></div></div><div dir="ltr" class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"><div class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"><div class="gmail_quote gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"><div class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"> <br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg"></div><blockquote class="gmail_quote gmail-m_2552952784852801906m_-3594817603051613428gmail_msg" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
whichever component of the compiler is creating the jump table (for example, it<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
needs to know the size of each entry, and therefore the specific instructions<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
used), and we can therefore simplify things greatly by not considering jump<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
tables as consisting of instructions, but rather known strings of bytes in<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
the .text section with a relocation pointing to the function address. For<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
example, on x86:<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
$ cat tc.ll<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
declare void @foo()<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
define void @bar() {<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
tail call void @foo()<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
ret void<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
}<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
$ ~/src/llvm-build-rel/bin/llc -filetype=obj -o - tc.ll -O3 |~/src/llvm-build-rel/bin/llvm<wbr>-objdump -d -r -<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<stdin>: file format ELF64-x86-64<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
Disassembly of section .text:<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
bar:<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
0: e9 00 00 00 00 jmp 0 <bar+5><br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
0000000000000001: R_X86_64_PC32 foo-4-P<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
Or on ARM:<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
$ ~/src/llvm-build-rel/bin/llc -filetype=obj -o - tc.ll -O3 -mtriple=armv7-unknown-linux |~/src/llvm-build-rel/bin/llvm<wbr>-objdump -d -r -<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<stdin>: file format ELF32-arm-little<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
Disassembly of section .text:<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
bar:<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
0: fe ff ff ea b #-8 <bar><br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
00000000: R_ARM_JUMP24 foo<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
How can we represent such jump table entries in IR? One way that almost<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
works on x86 is to attach a constant to a function using either prefix data<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
or prologue data, or to place a GlobalVariable in the .text section using<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
the section attribute. The constant would use ConstantExpr arithmetic to<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
produce the required PC32 relocation:<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
define void @bar() prefix <{ i8, i32, i8, i8, i8 }> <{ i8 -23, i32 trunc (i64 add (i64 sub (i64 ptrtoint (void ()* @foo to i64), i64 ptrtoint (void ()* @bar to i64)), i64 3) to i32), i8 -52, i8 -52, i8 -52 }> {<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
...<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
}<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
However, this is awkward, and can’t be used to represent an ARM jump table<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
entry. (It also isn’t quite right; PC32 can trigger the creation of a<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
PLT entry, which doesn’t entirely match what the ConstantExpr arithmetic<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
is doing.)<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
Design<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
A relocation can be seen as having three inputs: the relocation type (on<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
Mach-O this also includes a pcrel flag), the target, and the addend. So<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
let’s define a relocation constant like this:<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
iNN reloc relocation_type (ptr target, iNN addend)<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
where iNN is some integer type, and ptr is some pointer type. For example,<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
an ARM jump table entry might look like this:<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
i32 reloc 0x1d (void ()* @foo, i32 0xeafffffe) ; R_ARM_JUMP24 = 0x1d<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
There is no error checking for this; if you use the wrong integer type for<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
a particular relocation, things will break and you get to keep both pieces.<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
At the asm level, we would add a single directive, ".reloc", whose syntax<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
would look like this when targeting ELF and COFF:<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
.reloc size relocation_type target addend<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
or this when targeting Mach-O:<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
.reloc size relocation_type pcrel target addend<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
The code generator would emit this directive when emitting a reloc in a<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
constant initializer. (Note that this means that reloc constants would only<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
be supported with the integrated assembler.)<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
For example, the ARM JUMP24 relocation would look like this:<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
.reloc 4 0x1d foo 0xeafffffe<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
We would need to add some mechanism for the assembler to evaluate relocations<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
in case the symbol is locally defined and not exported. For that reason,<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
we can start with a small set of supported "internal" relocations and expand<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
as needed.<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
What about constant propagation?<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
We do not want reloc constants to appear in functions' IR, or to be propagated<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
out of global initializers that use them. The simplest solution to this<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
problem is to only allow reloc constants in constant initializers where we<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
cannot/do not currently perform constant propagation, i.e. function prologue<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
data, prefix data and constants with weak linkage. This could be enforced<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
by the verifier. Later we can consider relaxing this constraint as needed.<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
Other uses<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
Relocation constants could be used for other purposes by frontends. For<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
example, a frontend may need to represent some other kind of custom/specific<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
instruction sequence in IR, or to create arbitrary kinds of references between<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
objects where that may be beneficial (for example, -fsanitize=function may<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
use this facility to create GOTOFF relocations in function prologues to<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
avoid creating dynamic relocations in the .text section to fix PR17633).<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
Thanks,<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
--<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
Peter<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
<br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
[1] <a href="http://www.pcc.me.uk/~peter/acad/usenix14.pdf" rel="noreferrer" class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg" target="_blank">http://www.pcc.me.uk/~peter/ac<wbr>ad/usenix14.pdf</a><br class="gmail-m_2552952784852801906m_-3594817603051613428gmail_msg">
</blockquote></div></div></div></blockquote></div></div></div></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail-m_2552952784852801906gmail_signature"><div dir="ltr">-- <div>Peter</div></div></div>
</div></div>