[llvm] [AArch64][CostModel] Consider the cost of const vector (PR #117539)
Sushant Gokhale via llvm-commits
llvm-commits at lists.llvm.org
Sun Dec 15 22:42:45 PST 2024
sushgokh wrote:
Few thoughts here @david-green based on your previous comments
1. I am revising this patch only to consider cost for const-vector of integer types. Currently, as you said, we dont have a consistent method that can fetch the cost for floating point constants(will try to come up with this future).
2. Cost of cont-vectors in loops and other places
Revised patch checks if the const-vector has already been mateerialized in the function. If yes, we consider vector cost for such const-vector as 0. Checking if the scalar has already been materialized is difficult because of the way SLP operates.
(Checking this for scalars is also imporatant when support for floats is added. As you said, in some cases, there can loads for doubles and this would considerably affect the cost.)
3. Why the cost of 4 for such const-vectors?
Materializing such const-vectors results in `index` or `ldr` instruction.
a) index with both args as constant is relatively cheap and is 1 uops. index with one argument as register has high latency and has 2 uops.
b) ldr has high latency but has 1 uops.
My reasoning behind coming up with specific number for the cost is we are not just comparing index/ldr from const-pool with simple loads(which are 1sh cost). We are comparing them with mov/fmov. If we compare their latencies, I think it wont be unfair to do so. Correct me if wrong.
Also, I didnt quite get how #uops affect the cost besides the latency. Could you please explain this aspect?
4.
> Can you explain more details about what the gain is caused by, and what code it comes from? Is there some other place
that the cost is inaccurate in it?
I analyzed this. The perf uplift is coming from the function 'randint'. `index` instruction is not on the critical path and hence, I
wont say this patch is causing the uplift. What's causing the delay in vector case is `ldr d0,[x0]` instruction(as against `ldp w8,
w9, [x0]` in the scalar version). The costing is inaccurate here.
https://github.com/llvm/llvm-project/pull/117539
More information about the llvm-commits
mailing list