[llvm] [AArch64][SLP] Add NFC test cases for floating point reductions (PR #106507)

Wed Sep 4 01:01:46 PDT 2024

davemgreen wrote:

Hi again - It looks like we should be able to treat a faddp the same as a fadd cost-wise on most modern cpus (and by default). Some older cpus prior to cortex-a73 (but not little cores) had them a little higher, we might want to add a target feature if needed, but I think this would make a good default cost-model.
The Throughput=4 isn't really meaningful with how we model costs at the moment, and the Latency=2 would only be used for TCK_Latency (although we don't currently handle very thoroughly). The default TCK_RecipThroughput just adds together reciprocal throughput estimations that are relative to one-another. The cost should either be similar to a fadd for each step (which I believe is 1 now), or doubling it is probably fine if that produces better results (and then would probably be OK for any CPU).

It might be easier in this case add any extra CostModel/AArch64 tests in the same pr as the costmodel adjustments, as that will show what tests we really need. The SLP ones look like good additions if we remove the -mcpu option, but it might be good to have at least some tests for both fast and non-fast. FP16 costs are usually dependant on whether +fullfp16 is present (they should ideally promote to fp32 otherwise), so it might be worth having an extra run line for those if it will be relevant in the end.

https://github.com/llvm/llvm-project/pull/106507