[llvm] [AArch64,TTI] Remove RealUse check for vector insert/extract costs. (PR #146526)

Mon Jul 7 04:20:34 PDT 2025

https://github.com/fhahn commented:

> See also #138811, which did less than this patch as it left the insert-into-undef in place. That is part of a splat pattern
> 
> ```
>   %insert = insertelement <4 x i32> poison, i32 %scalar, i32 0
>   %splat = shufflevector <4 x i32> %insert, <4 x i32> poison, <4 x i32> zeroinitializer
> ```
> 
> That should probably have a total cost ~ the same as a gpr->fpr insert (so ~2 in total). Currently it is a little lower I believe, at 1 for the two instructions. Do we go to a cost of 3 now?
> 
> I mostly didn't push it yet because of the multiple_reduction.ll case. It is a clear regression but you can argue that it was only correct by accident in the past. #55350
> 
> There is also the question of things that are actually in FPR regs already, but I'm not sure how to address that. I'm not sure what to suggest needs fixing exactly, I think this patch is probably fine and a step towards more correct values. Do you have any opinion on the splat case and whether the SLP issue is possible to address?

> See also #138811, which did less than this patch as it left the insert-into-undef in place. That is part of a splat pattern
> 
> ```
>   %insert = insertelement <4 x i32> poison, i32 %scalar, i32 0
>   %splat = shufflevector <4 x i32> %insert, <4 x i32> poison, <4 x i32> zeroinitializer
> ```
> 
> That should probably have a total cost ~ the same as a gpr->fpr insert (so ~2 in total). Currently it is a little lower I believe, at 1 for the two instructions. Do we go to a cost of 3 now?
> 

Ah yes that's a great point. Now the cost will be 3, so likely too high. I've updated the patch to return cost of 0 if when the operand is poison and index is 0.

Still waiting for the performance results.
> I mostly didn't push it yet because of the multiple_reduction.ll case. It is a clear regression but you can argue that it was only correct by accident in the past. #55350
> 
> There is also the question of things that are actually in FPR regs already, but I'm not sure how to address that. I'm not sure what to suggest needs fixing exactly, I think this patch is probably fine and a step towards more correct values. Do you have any opinion on the splat case and whether the SLP issue is possible to address?

I still need to double-check what's going on with the SLP reduction case.

For `straight`, there's a large number of extracts in the vectorized case, so it's not immediately clear to me if the vectorized version would actually be profitable.

For `looped`, there are no extends generated, this should be vectorized I think.

https://github.com/llvm/llvm-project/pull/146526