[PATCH] D24451: [LoopUnroller] Replace UnrollingPreferences::Force with ForceMaxCount + SystemZ getUnrollingPreferences().

Tue Sep 13 01:50:00 PDT 2016

jonpa added a comment.

Hi Evgeny,

thanks for review!

> Overall, I don't like that Force unroll has its own MAX counter, but Partial and Runtime do not.

There is MaxCount for partial/runtime  and also FullUnrollMaxCount. So to me ForceMaxCount seemed logical, since forced unrolling is certainly a bit separate from the rest, to me.

> What is your case where you need to bound Force unroll counter, but Runtime is ok?

On z13, max 6 instructions can be decoded every cycle, but the SystemZ branch predictor can only handle a taken branch every other cycle. This means that tiny loops (<6 instructions or so) is really bad, and the main purpose of this patch is to get rid of them.

Before enabling forced unrolling, there were still a very significant amount of such small loops left in benchmarks (SPEC). On SystemZ, a conditional branch is statically predicted as not taken, which is why forced unrolling should work (by inserting conditional exit-branches after each cloned iteration). Experiments showed that performance was best if just force-unrolling two or three times, and not more. So the idea here is to force-unroll just enough to make the loop bigger than 6 instructions at least, but not more.

There is also the issue of store tags depletion on z13. Simply put, the resulting loop should not have too many stores, because that may cause very expensive flushes. This is implemented in the patch by counting all the stores in the loop, and never unrolling past 12 stores in the final loop. This limit must be used for all unrolling, including the forced unrolling (and runtime).

> If you want to introduce a bound for Forced unroll, Threshold bound looks more general.

Do you mean I should Threshold as a way to limit the forced unrolling? Could this be done with exact control over resulting iterations?

> Do you have a test case to add?

Sorry, but I can only say that this patch finally after a lot of experimentation passed extensive performance tests and is really needed without any functional change.

/Jonas

https://reviews.llvm.org/D24451