[PATCH] D89693: [AArch64] Favor post-increments

Thu Oct 22 12:26:58 PDT 2020

SjoerdMeijer added a comment.

In D89693#2346568 <https://reviews.llvm.org/D89693#2346568>, @dmgreen wrote:

> Those numbers don't look too bad, but like you say it's probably worth looking into what x264_r is doing, just to see what is going on. Sanne ran some other numbers from the burst compiler and they were about the same - some small improvements, a couple of small losses but overall OK. That gives us confidence that big out of order cores are not going to hate this.
>
> The original tests were on an in-order core I believe? Which from the optimization guide looks like it should be sensible to use. And the option doesn't seem to be messing anything up especially.

I analysed x264 and couldn't find any concerning codegen changes in the top 6 hottest functions in the profile. Then, I did more runs and concluded I must have been looking at noise (again) as I don't see that 1.38% regression anymore. It is more like 0.5% if it happens. Overall, my conclusion is in line with yours: this change is neutral on bigger cores worst case, but probably a small gain, and indeed in my first experiment on an in-order core I see decent speed-ups. Intuitively this makes sense, because probably the bigger ooo cores can deal better with inefficient code, while the smaller ones are more sensitive for this and an optimisation has more effect. While looking at x264, I did observe this for some cases:

> The option tends to make loops use more registers,

I saw some more registers being used in preheaders to setup pointers, but then didn't seem to affect the loop.

> Can you add a test for vector postincs, the kind of thing that you would get in a loop? I only see changes for scalars here.

Cheers, will look at this now.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89693/new/

https://reviews.llvm.org/D89693