[PATCH] D84108: [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline

Thu Aug 20 08:40:50 PDT 2020

lebedev.ri added a comment.

In D84108#2228500 <https://reviews.llvm.org/D84108#2228500>, @echristo wrote:

> In D84108#2227701 <https://reviews.llvm.org/D84108#2227701>, @lebedev.ri wrote:
>
>> In D84108#2227371 <https://reviews.llvm.org/D84108#2227371>, @echristo wrote:
>>
>>> One more follow up here:
>>>
>>> One set of benchmarks we're seeing this in are also in the llvm test harness:
>>>
>>> llvm_multisource_tsvc.TSVC-ControlFlow-flt and llvm_multisource_misc.Ptrdist-yacr2
>>>
>>> in an FDO mode (so we have some decent branch hints for the code).
>>>
>>> You should be able to duplicate at least some of the slowdown there.
>>
>> That sadly doesn't tell me much.
>> Can you please provide the reproduction steps, and ideally the IR with/without this change?
>
> It should be as straightforward as -O3 -fexperimental-new-pass-manager, if that isn't then we'll look a bit more. Those tests are in the llvm test-suite so they should be easy to get as part of your checkout.

No i mean FDO. Can you either point me at the docs where said incantation for test-suite is spelled out, or at provide commands nessesary to reproduce?

> Ideally for a change as large as this you'd have run these benchmarks yourself - they're fairly useful here and I think I'd have expected them if I were reviewing.
>
>> In D84108#2227365 <https://reviews.llvm.org/D84108#2227365>, @echristo wrote:
>>
>>> So, we're seeing several significant (20-30%) regressions due to this in various different library benchmarks. Usually around things that are compression/decompression loops, but also other places.
>>>
>>> I'm uncertain what we can do, but I think this might need a wider range of discussion rather than the phabricator review.
>>>
>>> Would you mind terribly reverting and starting a thread on llvm-dev about this? I think that'll give us an opportunity to weigh some of the tradeoffs in a wider space.
>
> I think I'm still here. I'll comment a bit more in reply to David below.

In D84108#2228508 <https://reviews.llvm.org/D84108#2228508>, @echristo wrote:

> In D84108#2227709 <https://reviews.llvm.org/D84108#2227709>, @dmgreen wrote:
>
>> We saw performance improvements from this patch. They were a little bit up-and-down, but definitely an improvement overall.
>
> Right. I think you're still in scientific computing yes? The internal and external benchmarks are fairly straightforward scalar code. I.e. if they were vectorizable we'd probably have done so ages ago for the internal and the external are things like bytecode interpreters etc. So the patch is penalizing things that just aren't vectorizable for things that might be - and I'm not sure that's the right tradeoff to make here. Let's start an llvm-dev discussion and bring in a few more people as well. I'd be very interested in Kit's perspective here too.

"hey let's penalize some code by +30% runtime because that improves some code by -10%" obviously won't fly,
so unless you want to start the discussion, i want to first understand what is actually going on.
I'm going to guess Such huge regressions likely mean lack of inlining.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D84108/new/

https://reviews.llvm.org/D84108