[llvm-dev] LLVM data structures between modules

Wed Feb 24 13:46:50 PST 2021

On Wed, Feb 24, 2021 at 1:25 PM Mohannad Ismail <imohannad at vt.edu> wrote:

> Thank you for your reply!
>
> Correct me if I'm wrong, but doesn't LTO kick in after the compilation
> phase?
>

"compilation phase" is a bit vague when it comes to LTO.

In simple full LTO - yes, the compiler runs, does some optimizations but
keeps the representation in LLVM IR, not lowering it to machine code, then
the linker runs, realizes it's been given IR not machine code, and passes
all the IR files back to the compiler - the compiler links all the IR
together, then does more optimization on that one big IR file and
eventually does code generation on it.

This is the reality of the compilation model - the conversion to IR happens
without global knowledge, otherwise that step couldn't be
distributed/parallelized and builds would be very slow. But if you want to
configure your own optimization pipelines etc, there's no reason that first
(isolated) stage of compilation has to do much work - it could do no work,
to ensure that during LTO whatever important properties are preserved for
discovery by your analyses.

ThinLTO does all this, but its merge step is more custom built - during the
first stage of compilation, again some IR transformations/optimizations are
done and then the IR is emitted, along with a side file summary of
important details - those summary files are sent to the "thin link" step,
which does the cross-module/whole-program analysis using the summaries, and
then produces a report of sorts, that says what cross-module compilation
should be done (eg: "import thing X from file A into file B, so that B can
see details of X (for analysis, inlining, etc)" ) and then backend
compilations, running distributed/in parallel, consume those reports and
load the originally emitted IR, perform further optimization based on the
reports, and eventually emit machine code. That then goes to the
traditional linker for the normal linking step.

> My current pass makes instrumentations and the pass I want to run before
> it won't make any instrumentations. So what I am thinking is that one would
> be an analysis pass and the other would be a transformation pass. From
> what I understand, instrumentation passes run first before optimization and
> linking, thus LTO might not help with that.
>

instrumentation passes aren't "special" in any way I know of - they're
another kind of transformation - but, yes, if they need to run early
because optimizations would destroy the properties they want to discover,
then you'd probably have to have a custom optimization pipeline to do no
optimizations (or none that would destroy the invariants you care about)
before you can do the whole program analysis you want to do (ie: don't
destroy the invariants in the first optimization pipeline - so you can
preserve them until LTO or ThinLTO time where they can be discovered, and
then the backend compilation(s) can act on them)

> Do I need to convert my passes to be optimization passes to take advantage
> of LTO? How do I do that? Thank you for your help and support!
>
> On Wed, Feb 24, 2021 at 4:01 PM David Blaikie <dblaikie at gmail.com> wrote:
>
>> If you want to do cross-file optimization, you're looking for/want to use
>> something like LTO or ThinLTO. (see, for instance, the whole program
>> devirtualization work done with ThinLTO recently)
>>
>> On Wed, Feb 24, 2021 at 12:28 PM Mohannad Ismail via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Greetings everyone,
>>>
>>> I currently have a pass (Pass 2) that does some transformations. What I
>>> want to do is to have a pass (Pass 1) that runs before Pass 2, collects
>>> some IR information, stores it in a data structure and passes the data
>>> structure to Pass 2 so that I can use it for specific transformations. I
>>> think this can be done with getAnalysisUsage, but I'm not sure how. I would
>>> like to know how to do that exactly, if it's possible.
>>>
>>> Another thing, and this is the tricky part, is that I want Pass 1 to run
>>> on all the source files I have first before Pass 2 runs and pass a
>>> collective data structure to Pass 2. In other words, I want Pass 1 to run
>>> across all the modules and source files first, collect information, pass it
>>> to Pass 2 then Pass 2 runs. Is there a way to tell LLVM to do this type of
>>> "double compilation"?
>>>
>>> Hope I was able to explain this well enough. Please let me know if I
>>> wasn't clear or if you have any questions. Thank you very much!
>>>
>>> Best regards,
>>> Mohannad Ismail
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210224/0a7899a9/attachment.html>