[LLVMdev] "Mapping High-Level Constructs to LLVM IR"

Fri Nov 22 21:54:56 PST 2013

On 11/22/2013 9:25 PM, Mikael Lyngvig wrote:
> Hi guys,
>
> I have begun writing on a new document, named "Mapping High-Level 
> Constructs to LLVM IR", in which I hope to eventually explain how to 
> map pretty much every contemporary high-level imperative and/or OOP 
> language construct to LLVM IR.
>
> I write it for two reasons:
>
> 1. I need to know this stuff myself to be able to continue on my own 
> language project.
> 2. I feel that this needs to be documented once and for all, to save 
> tons of time for everybody out there, especially for the language 
> inventors who just want to use LLVM as a backend.
>
> So my plan is to write this document and continue to revise and 
> enhance it as I understand more and helpful people on the list and 
> elsewhere explain to me how these things are done.
>
> Basically, I just want to know if there is any interest in such a 
> document or if I should put it on my own website.  If you know of any 
> books or articles that already do this, then please let me know about 
> them.
>
> I've attached the result of 30 minutes work, just so that you can see 
> what I mean.  Please don't review the document as it is still in its 
> very early infancy.

There is a strong bias towards C++ in the document, which isn't a 
particularly strong slice of higher-level constructs. For example, C++'s 
RTTI constructs serve three distinct purposes: exception handling, 
dynamic casts, and reflection (although C++'s reflection capabilities 
are extremely weak). You'll need to talk about inheritance in the three 
cases: single, multiple, and virtual (to use C++'s terminology) (note 
that Java's interfaces can be implemented as virtual inheritance). 
Boxing is another important topic. Lambdas, closures, and generators 
(yield keyword) are becoming increasingly common in modern programming 
languages, and should not be ignored.

Finally, calling propagated return values "exception handling" does an 
extreme disservice to your readers. LLVM IR explicitly models exception 
handling, and attempting to describe it lowered as return values is not 
how anyone should implement it. If you badly want to describe it in C 
terms, you could at least use C's setjmp/longjmp to describe it; the 
truth is, this is a feature which doesn't exist cleanly in C.

Trying to describe mapping higher-level languages to C and then C to IR 
is a poor idea. C is in some ways an extremely limited language (no 
native exception handling constructs, e.g.). If you want to be a guide 
to how to lower languages to LLVM IR, you need to also explain how to 
take advantage of features in the IR to optimize code better (e.g., 
TBAA). Cfront-like C++ compilers are extremely rare-to-nonexistent (in 
part because it is difficult to map some features, most notably 
exception handling, cleanly and efficiently into C); if your guide is 
describing such an approach, it reads like an implicit endorsement. It 
is possible to describe some aspects of the IR in C, but if the goal is 
to lower to IR, then the description should be lowering to IR, not 
lowering to C.

-- 
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist