[llvm-dev] Replicate Individual O3 optimizations

David Blaikie via llvm-dev llvm-dev at lists.llvm.org
Thu Oct 24 21:29:40 PDT 2019


It's 'known' (by some number of LLVM developers) that opt -O3 isn't the
same as clang -O3. It'd be nice if they were closer - patches welcome, etc,
but it hasn't been a priority for anyone. opt -O3 is rarely used - usually
opt is used for testing specific optimizations.

Clang's IR output will differ between -O0 and -O3 (even before running any
LLVM optimizations) - things like lifetime intrinsics, etc, are emitted
only with optimizations enabled, for instance.

If you want to reproduce clang's -O3, best to use clang -O3 (with source
code, or with LLVM IR generated from clang -O3 (so it has lifetime
intrinsics, etc))

On Thu, Oct 24, 2019 at 9:22 PM Neil Nelson via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Yes, this is another indication that there some processing or bridge in
> the clang -O3 compile not so far evidenced as well when compiling with
> clang to its IR before the optimization passes.
>
> This may be an issue explained in a yet to be known documentation page. Or
> it may be a point at the moment overlooked by the well informed.
>
> An issue being noted here but not well addressed is that a well stated
> design of LLVM with its front-ends and back ends is that the front-ends
> compile to an IR without optimization that LLVM uses for optimization and
> preparation for various back-ends. But that with clang -O3, given this
> evidence, we are not easily seeing how the division between the clang front
> end and LLVM works, though the assumed design suggests it should be quite
> easy.
>
> We should be able to compile with clang to the IR before optimization and
> then apply the LLVM optimization separately to obtain the same final IR as
> a clang -O3 compile doing both of those. But we are not seeing that.
>
> This also bears on the e2e thread in that this assumed division posits
> that the separate clang and LLVM debug sequences can provide a high
> reliability since the IR intermediate between the two is not expected to be
> that error prone. The errors are expected to be primarily either in clang
> in obtaining a correct IR or in opt (LLVM) in optimizing that IR for the
> back-end. But since we are not able to identify the IR between the two
> under clang -O3 it is a question as to what debug sequence would handle
> what we could not identify.
>
> Neil Nelson
> On 10/24/19 5:04 AM, hameeza ahmed wrote:
>
> I run matrix multiplication code with both the approaches o3 at clang and
> o3 at opt. clang o3 is about 2.97x faster than opt o3.
>
>
>
> On Mon, Oct 21, 2019 at 8:24 AM Neil Nelson <nnelson at infowest.com> wrote:
>
>> is_sorted.cpp
>> bool is_sorted(int *a, int n) {
>>
>>   for (int i = 0; i < n - 1; i++)
>>
>>     if (a[i] > a[i + 1])
>>       return false;
>>   return true;
>> }
>>
>> https://blog.regehr.org/archives/1605 How Clang Compiles a Functionhttps://blog.regehr.org/archives/1603 How LLVM Optimizes a Function
>> clang version 10.0.0, Xubuntu 19.04
>>
>> clang is_sorted.cpp -S -emit-llvm -o is_sorted_.ll
>> clang is_sorted.cpp -O0 -S -emit-llvm -o is_sorted_O0.ll
>> clang is_sorted.cpp -O0 -Xclang -disable-llvm-passes -S -emit-llvm -o is_sorted_disable.ll
>>
>> No difference in the prior three ll files.
>>
>> clang is_sorted.cpp -O1 -S -emit-llvm -o is_sorted_O1.ll
>>
>> Many differences between is_sorted_O1.ll and is_sorted_.ll.
>>
>> opt -O3 -S is_sorted_.ll -o is_sorted_optO3.ll
>>
>> clang is_sorted.cpp -mllvm -debug-pass=Arguments -O3 -S -emit-llvm -o is_sorted_O3arg.ll
>> opt <optimization sequence obtained in prior step> -S is_sorted_.ll -o is_sorted_opt_parms.ll
>>
>> No difference between is_sorted_optO3.ll and is_sorted_opt_parms.ll, the last two opt runs.
>> Many differences between is_sorted_O3arg.ll and is_sorted_opt_parms.ll, the last two runs,
>> clang and opt.
>>
>> Conclusions:
>>
>> Given my current understanding, the ll files from the first three clang runs
>> are before any optimizations. Those ll files are from the front-end phase (CFE).
>> But this is a simple program and it may be that for a more complex program that
>> the ll files could be different.
>>
>> Whether or not we use a -O3 optimization or use the parameters provided by clang for a
>> -03 optimization, we obtain the same result.
>>
>> The difference in question is why an opt run using the CFE ll before optimization
>> obtains a different ll than a CFE run that includes optimization. That is, for this case,
>> it is not the expansion of the -O3 parameters that is the difference.
>>
>> Initially, it would be interesting to have an ll listing before optimization from the
>> clang run that includes optimization to compare with the ll from the clang run without
>> optimization.
>>
>> Neil Nelson
>>
>> On 10/19/19 11:48 AM, Mehdi AMINI via llvm-dev wrote:
>>
>>
>>
>> On Thu, Oct 17, 2019 at 11:22 AM David Greene via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> hameeza ahmed via llvm-dev <llvm-dev at lists.llvm.org> writes:
>>>
>>> > Hello,
>>> > I want to study the individual O3 optimizations. For this I am using
>>> > following commands, but unable to replicate O3 behavior.
>>> >
>>> > 1. Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/clang
>>> -O1
>>> > -Xclang -disable-llvm-passes -emit-llvm -S vecsum.c -o vecsum-noopt.ll
>>> >
>>> > 2. Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/clang
>>> -O3
>>> > -mllvm -debug-pass=Arguments -emit-llvm -S vecsum.c
>>> >
>>> > 3. Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/opt
>>> > <optimization sequence obtained in step 2> -S vecsum-noopt.ll -S -o
>>> > o3-chk.ll
>>> >
>>> > Why the IR obtained by above step i.e individual O3 sequences, is not
>>> same
>>> > when O3 is passed?
>>> >
>>> > Where I am doing mistake?
>>>
>>
>> If you could provide the full reproducer, it could help to debug this.
>>
>>
>>>
>>> I think you need to turn off LLVM optimizations when doing the
>>> -emit-llvm dump.  Something like this:
>>>
>>> Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/clang -O3 \
>>>   -mllvm -debug-pass=Arguments -Xclang -disable-llvm-optzns -emit-llvm \
>>>   -S vecsum.c
>>>
>>> Otherwise you are effectively running the O3 pipeline twice, as clang
>>> will emit LLVM IR after optimization, not before (this confused me too
>>> when I first tried it).
>>>
>>
>> This is the common pitfall indeed!
>> I think they are doing it correctly in step 1 though by including:
>> `-Xclang -disable-llvm-passes`.
>>
>>
>> That said, I'm not sure you will get the same IR out of opt as with
>>> clang -O3 even with the above.  For example, clang sets
>>> TargetTransformInfo for the pass pipeline and the detailed information
>>> it uses may or may not be transmitted via the IR it dumps out.  I have
>>> not personally tried to do this kind of thing in a while.
>>
>>
>> I struggled as well to setup TTI and TLI the same way clang does :(
>> It'd be nice to revisit our PassManagerBuilder setup and the opt
>> integration to provide reproducibility (maybe could be a starter project
>> for someone?).
>>
>> --
>> Mehdi
>>
>>
>> _______________________________________________
>> LLVM Developers mailing listllvm-dev at lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191024/0c48400f/attachment-0001.html>


More information about the llvm-dev mailing list