[llvm-dev] infer correct types from the pattern

Thu Mar 31 12:30:54 PDT 2016

On 3/31/2016 11:53 AM, Rail Shafigulin wrote:
>
> I'm curious how do you know LLVM so well? Most of the times your answers
> are exactly what I need. I was recommended to read code (as usual),
> however it is challenging without knowing what the code is trying to
> express. IMHO it is better to have a concept first and then express it
> in code. I've been trying to find books, tutorials, etc, but there
> doesn't seem to be good examples out there. Basically my questions are:
>
> 1. What is your adivce on learning LLVM (and compiler design)?
> 2. Is there a way to do quickly and efficiently or I will just have to
> suffer through several years of painstaking trial and error as well as
> my own research on the topic?

That is kind of hard to answer satisfactorily.  I had done compiler 
development for 8 years before moving on to LLVM, so the understanding 
of how compilers work was not a problem.  The rest was essentially 
reading the code and writing my own.  The beginnings are slow and 
painful, but the more information you absorb, the faster it becomes.

There are some general principles of compiler development, namely that 
you start having a lot of high-level information about the program 
structure, and then the "granularity" increases: the level of detail in 
the representation increases at the cost of losing the high-level 
information.  For example, early on, loops and loop nests may be 
structured nicely, making them easy to optimize, but then some branches 
may become folded, or optimized and the CFG may no longer be so clear. 
So, you perform loop nest optimizations before that happens.  Then you 
run passes that are not concerned with the high-level structures, then 
you run passes that look into even more details, and so on.  In case of 
LLVM, first you have a bunch of passes that do target-independent things 
on the LLVM IR, then the influence of target-dependent information (like 
TTI) increases, then you have the selection DAG, then the DAG is 
legalized, then instructions are selected.  After that you have MI with 
SSA, then register allocation begins and you have MI without SSA, then 
the register allocation ends and you have physical registers.  Then 
machine functions get prolog and epilog, then the instructions are 
lowered to the MC layer, then that is printed (in text format, or 
encoded) into the output stream.  Each of these stages has certain 
properties and the passes that run there utilize (and usually preserve) 
these properties.  The actual details are basically only visible in the 
sources, but if you have a general idea about what is happening, these 
details will be fairly understandable.

The TableGen?  That was a painstaking trial and error. :)

-Krzysztof

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
hosted by The Linux Foundation