[llvm] [RISCV] Account for factor in interleave memory op costs (PR #111511)
Philip Reames via llvm-commits
llvm-commits at lists.llvm.org
Fri Oct 11 13:00:12 PDT 2024
preames wrote:
I got curious, and went and did a full set of Factor vs SEW vs LMUL sweeps for the segmented loads on the BP3. You can find all the data here: https://github.com/preames/bp3-microarch/#vlseg_lmul_x_sew_throughput
Overall, my data confirms the snippets that Luke has posted above. Let me summarize what I think is going on here.
BP3 appears to have two implementations - one used for factors 2,3,4 and the other for factors 5,6,7,8. The first appears to be a wide load followed by some kind of shuffle operation, whereas the second appears to scale with the number of 128 b loads required + some kind of adjustment term. I'm having trouble fitting a good formula for either to be honest.
Given these results, and what Craig has noted above about the x280, I think we need to have distinct costing models for different factors. Annoyingly, it looks like that threshold may need to differ by processor as well.
https://github.com/llvm/llvm-project/pull/111511
More information about the llvm-commits
mailing list