[llvm] [AArch64] Fix heuristics for folding "lsl" into load/store ops. (PR #86894)

Fri Mar 29 00:53:26 PDT 2024

https://github.com/davemgreen commented:

Thanks for looking into this, it is one thing off my todo list.

> It turns out the current commit message isn't precisely right. Current Cortex cores (X2 and later) have free shift by 1 for integer loads... but no free shift by 1/4 for floating-point loads. Not sure if it's worth explicitly modeling int-shift vs. float-shift.

That might be an inaccuracy in the optimization guide more than a difference between int/fp. lsl #4 are still an extra operation, but I do not believe they come up very often.

> https://reviews.llvm.org/D155470#4527270 suggests that we should default to AddrLSLSlow14... I'm not sure if that's the right choice. An explicit shift is guaranteed to increase latency, but an extra integer micro-op generated by a folded shift might not matter in a lot of cases.

Yeah I agree - with more new cores having faster #1 shifts I think it should be OK as the default.

https://github.com/llvm/llvm-project/pull/86894