[PATCH] D43256: [MBP] Move a latch block with conditional exit and multi predecessors to top of loop

Thu Aug 8 10:25:10 PDT 2019

hjagasia added a comment.

We are also seeing this patch slow down one of our internal benchmarks and speed up another one on the Qualcomm Hexagon target. 
In both cases the static estimated profile is used - and the static profile is representative. In both cases D43256 <https://reviews.llvm.org/D43256> basically lays outs executed hot code closer together improving cache utilization. However in both cases we see critical path length and the number of jumps in the critical path increase. So a precise cost model is a good idea. We spent some time analyzing why one benchmark got worse - we can see more mispredicts - but there may be more going on under the hood. 
The other benchmark that speeds up - we see the new layout lowers pressure on an internal branch target hardware resource - the critical loop has a lot of calls that have already increased pressure on that resource. 
We dont have sources for these benchmarks.

We have verified that D65673 <https://reviews.llvm.org/D65673> restores the old behavior on the benchmark that got worse and throwing the flag -force-precise-rotation-cost=true helps us keep the improvement on the other one.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D43256/new/

https://reviews.llvm.org/D43256