[llvm] [AArch64][CostModel] Consider the cost of const vector (PR #117539)

Tue Dec 3 00:40:35 PST 2024

sushgokh wrote:

> * The cost of constants is often (but not always) 0.
>   -- If the constant is shared then the cost 1/number-of-uses. There was a recent regression from this in [Regression from clang 16: missed vectorization in simple array initialization #111126](https://github.com/llvm/llvm-project/issues/111126).
>   -- If the constant is in a loop it can usually be hoisted. So cost is 1/num-of-iterations.

I have considered two different cases in the test case where same const vector is part of the same SLP tree and different SP tree. While the latter part cant be addressed right now because of the way SLP works, I wont be worried about it much unless some regression is reported.

>   -- If the constant can be folded into the instruction, especially for scalars the cost can be free. In the places where this is used it goes via getIntImmCost.

1. Can you provide example where the scalar can be free ? 
2. Regarding `getIntImmCost`, yes thanks. I was searching for this API to get the scalar cost

> This function looks like it is not the cost of a vector constant, but the relative cost of vector vs the scalar equivalents

Initial patch version split it into scalar cost and  vector cost but @alexey-bataev suggested to mask the calculation under Target TTI and hence, this came under single API. So, there are 2 issues here:
1. The API name particularly does not convey the intention
2. Calling this API , without scalar cost, wont be possible.
But maybe I have misinterpreted or @alexey-bataev can express his thoughts on getting the entire code in TTI

> Some of the scalars could be loaded from memory too.

Can you provide example ?

>4 seems like a very high cost for a constant-pool load, when measured as a throughput. I would expect it to be 1ish.

`ld1` instruction is already having cost=4 although we split it into `load+insert`. For this specific case, we dont have such thing and hence, the entire cost has been attributed to `ldr`. In fact, if the cost is reduced to 3, we dont see the gains with leela benchmark.

> This looks like it tries to discourage more SLP vectorization, which is probably not a bad thing in some cases.

I think its good  until some regression is reported.

>Including the costs somehow is probably a good idea, but there are reasons why this has not been done before. We should try and come up with a consistent way of doing it, ideally.

Let me try to make it more consistent. For ScalarCost, `getIntImmCost` is the right part. For VectorCost, I doubt I can do this because I dont think there is piece of code that returns cost of `ldr from memory` instruction.

https://github.com/llvm/llvm-project/pull/117539