[llvm-dev] llvm emits unoptimized code
kamlesh kumar via llvm-dev
llvm-dev at lists.llvm.org
Fri Nov 1 02:11:44 PDT 2019
CodeGenPrepare::optimizeMemoryInst is sinking address computation into
users basic block.
so if we disable this(-mllvm -disable-cgp) we get same code as gcc.
see here https://godbolt.org/z/bMvIsx
On Fri, Nov 1, 2019 at 12:06 AM Jorg Brown <jorg.brown at gmail.com> wrote:
> On Thu, Oct 31, 2019 at 11:26 AM David Blaikie <dblaikie at gmail.com> wrote:
>> On Thu, Oct 31, 2019 at 11:17 AM Jorg Brown via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>> On Thu, Oct 31, 2019 at 8:50 AM kamlesh kumar via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>>> Hi Devs,
>>>> Consider testcase here
>>>> When optimization is O1 or above it produces unoptimized code
>>>> because it calls __tls_get_address in loops.
>>>> While with optimization disabled
>>>> It produce single call to __tls_get_address outside of loop.
>>>> is this a missed optimization by llvm?
>>> It's interesting to me that there's a big difference in -fpie and -fpic.
>>> In particular, with -fpie, no call to __tls_get_addr is needed, so the underlying considerations for optimization change. This feels like the optimizer isn't taking in to account the overhead of -fpic, when determining whether to hoist the address calculation out of the loop.
>>> On Thu, Oct 31, 2019 at 10:36 AM David Blaikie via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>>> Looks pretty similar to the GCC generated code
>>> Challenge accepted => https://godbolt.org/z/8PX2La
>> Which challenge? Sorry, could've linked to the godbolt I was looking at when I said that: https://godbolt.org/z/_07tOk - comparing GCC and Clang trunk on the code linked in the original post.
> Right, your example showed where gcc and clang were similar.
> My example https://godbolt.org/z/8PX2La showed where gcc produced code that was possibly twice as fast as clang's code.
> -- Jorg
More information about the llvm-dev