[llvm-dev] llvm emits unoptimized code

Fri Nov 1 02:11:44 PDT 2019

Looks like,
CodeGenPrepare::optimizeMemoryInst is sinking address computation into
users basic block.
so if we disable this(-mllvm -disable-cgp) we get  same code as gcc.
see here https://godbolt.org/z/bMvIsx

On Fri, Nov 1, 2019 at 12:06 AM Jorg Brown <jorg.brown at gmail.com> wrote:
>
> On Thu, Oct 31, 2019 at 11:26 AM David Blaikie <dblaikie at gmail.com> wrote:
>>
>> On Thu, Oct 31, 2019 at 11:17 AM Jorg Brown via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>>
>>> On Thu, Oct 31, 2019 at 8:50 AM kamlesh kumar via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>>>
>>>> Hi Devs,
>>>> Consider testcase here
>>>> https://godbolt.org/z/qHZzqw
>>>> When optimization is O1 or above it produces unoptimized code
>>>> because it calls __tls_get_address in loops.
>>>> While with optimization disabled
>>>> It produce single call to __tls_get_address outside of loop.
>>>> is this a missed optimization by llvm?
>>>
>>>
>>> It's interesting to me that there's a big difference in -fpie and -fpic.
>>>
>>> https://godbolt.org/z/klX3q3
>>>
>>> In particular, with -fpie, no call to __tls_get_addr is needed, so the underlying considerations for optimization change.  This feels like the optimizer isn't taking in to account the overhead of -fpic, when determining whether to hoist the address calculation out of the loop.
>>>
>>> On Thu, Oct 31, 2019 at 10:36 AM David Blaikie via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>>>
>>>> Looks pretty similar to the GCC generated code
>>>
>>>
>>> Challenge accepted => https://godbolt.org/z/8PX2La
>>
>>
>> Which challenge? Sorry, could've linked to the godbolt I was looking at when I said that: https://godbolt.org/z/_07tOk - comparing GCC and Clang trunk on the code linked in the original post.
>
>
> Right, your example showed where gcc and clang were similar.
>
> My example https://godbolt.org/z/8PX2La showed where gcc produced code that was possibly twice as fast as clang's code.
>
> -- Jorg