[PATCH] D39976: [AArch64] Query the target when folding loads and stores

Tue Mar 27 15:10:54 PDT 2018

evandro added a comment.

In https://reviews.llvm.org/D39976#1040345, @gberry wrote:

> In https://reviews.llvm.org/D39976#1027478, @evandro wrote:
>
> > `FeatureSlowPaired128` was just too coarse.  The alternative would be to change it to something more specific, like `FeatureSlowSomePaired128Sometimes`, and then create yet another when for the next generation to specialize it further.  Instead, querying the scheduling model seems to be a much more reasonable approach.
>
>
> I'm more confused now.    'FeatureSlowPaired128' controls whether certain load/store opcodes are combined to form paired load/stores.  But this change prevents some load/store opcodes from having their base register increment folded in.  The two seem unrelated.

This change is more generic and flexible than `FeatureSlowPaired128`.  This change controls not only when loads and stores are paired, but also other foldings that this pass performs, including the pre or post indexing of the offset register.

> I'm also concerned that this change is introducing a very specific target hook and is recomputing the same "slowness" of opcodes over and over even though it doesn't depend on the context.  Perhaps a more general subtarget array of "slow" opcodes would be a better choice, which Exynos could initialize based on its scheduling model for these opcodes if you think there is going to be differences in future CPUs.

AFAIK, the code performs table look ups, which should be fairly efficient.  And, yes, just like there are differences in how well some loads and stores perform in https://reviews.llvm.org/M1 and M3, it's likely that more differences will come in their successors.

https://reviews.llvm.org/D39976