[llvm-dev] A 4x slower initialization loop in LLVM vs GCC and MSVC

Thu Oct 1 12:45:01 PDT 2020

Hi,

> On Sep 27, 2020, at 12:52, Stefanos Baziotis via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> Hi everyone,
> 
> I was watching this video [1]. There's an example of an initialization loop for which
> Clang unfortunately generates really bad code [2]. In my machine, the Clang version
> is 4x slower than the GCC version. I have not tested the MSVC version, but it should
> be around the same.
> 
> In case anyone's interested, in the video [1] Casey explains why this code is bad (around 59:39).
> 
> So, I tried to run -print-after-all [3]. There are a lot of passes that interact here, so I was
> wondering if anyone knows more about that. It seems to me that the problem starts
> with SROA. Also, I'm not familiar with how these llvm.memcpy / memset are handled down
> the pipeline. Finally, the regalloc probably did not go very well.

I filed https://bugs.llvm.org/show_bug.cgi?id=47705 to keep track of the issue.

While the code for the initialization is not ideal, it appears the main issue causing the slowdown is the fact that GCC interchanges the main loops, but LLVM does not. After interchanging, the memory access patterns are completely different (and it also probably slightly defeats the purpose of the benchmark).

There’s also an issue with SROA which splits a nice single consecutive llvm.memcpy into 3 separate ones. With SROA disabled there’s another ~2x speedup (on top of manually interchanging the loops, which gives a ~3x speedup).

Cheers,
Florian