[PATCH] D38196: [AArch64] Avoid interleaved SIMD store instructions for Exynos

Thu Oct 26 14:30:20 PDT 2017

kristof.beyls added a comment.

In https://reviews.llvm.org/D38196#906795, @az wrote:

> > I thought the "shouldExitEarly" method aims to check if any of the transformations the pass can do can be profitable for the target at all.
> >  My understanding is that this can be checked by just examining the instruction latencies, not having to look at the actual code being compiled at all. I assume that the saving of compile time comes from the fact that "shouldExitEarly" needs to do an amount of work proportional to the number of rewriting rules implemented, not proportional to the amount of code being compiled.
> >  So, for the pass to work as expected, i.e. always run when for the target at least one of the transformations is profitable, I think shouldExitEarly should check all rewriting rules implemented.
> > 
> > Does that make sense, or am I misunderstanding something here?
>
> Your understanding of ShouldExitEarly is accurate and I agree that ideally we should put down all rewriting rules. However, ShouldExitEarly is not very cheap as it calls shouldReplaceInstruction. The question is should we put some heuristics into ShouldExitEarly so that we do not check for all rules even though the number of rules is static and does not depend on code? I currently check for one rule and make it a representative of all (st2 in this case) and this may be over-simplification. Maybe, I can check for one representative of st2 and one for st4 (which both have many variants).

Hmmmm.... this seems to be a tough one to find the right tradeoffs....
My personal opinions/observations:

1. Making ShouldExitEarly stop the whole pass even if some of the optimizations would trigger is a hack that makes this pass operate in a non-obvious way and that will bite us in the future.
2. I guess there's a chance even more peephole-y patterns will be added to this pass in the future, which would make ShouldExitEarly even more expensive to check for all rewriting rules implemented. But also, if that function would take short cuts and heuristics, the pass would operate in more surprising and non-obvious ways.

The above made me wonder if to keep compile-time under control, it wouldn't be better to integrate this in one of the existing frameworks for peepholes rather than in a pass of its own. And then I looked at https://reviews.llvm.org/D21571 where this pass got introduced and saw such approaches were probably tried and rejected in early version of that review.
Right now, my best guess of the easiest way forward is to make ShouldExitEarly compute the profitability of all rewrite rules implemented, and let it cache the results of that analysis for each subtarget it encounters. That way, the cost of this computation should be amortized out over all functions the pass will handle, resulting in not having too much impact on compile time?
Having said that, I don't know of another pass that has a caching mechanism like this. But I also haven't tried looking for such an example. I'm not sure if there's going to be a gotcha in trying to cache those results somehow.
What do you think?

Thanks,

Kristof

https://reviews.llvm.org/D38196