[llvm] [SLP] Avoid crash in computeExtractCost (PR #93188)

Thu May 23 06:22:55 PDT 2024

alexey-bataev wrote:

> For a downstream target we ended up in a situation with assertion failures in ShuffleCostEstimator::computeExtractCost.
> 
> Input IR looked like this:
> 
> define void @foo(ptr %p0, <64 x i32> %vec) { %p1 = getelementptr i32, ptr %p0, i16 1 %p2 = getelementptr i32, ptr %p0, i16 2 %p3 = getelementptr i32, ptr %p0, i16 3 %p4 = getelementptr i32, ptr %p0, i16 4 %p5 = getelementptr i32, ptr %p0, i16 5 %p6 = getelementptr i32, ptr %p0, i16 6 %p7 = getelementptr i32, ptr %p0, i16 7 %elt = extractelement <64 x i32> %vec, i32 0 store i32 %elt, ptr %p0 store i32 %elt, ptr %p1 store i32 %elt, ptr %p2 store i32 %elt, ptr %p3 store i32 %elt, ptr %p4 store i32 %elt, ptr %p5 store i32 %elt, ptr %p6 store i32 %elt, ptr %p7 ret void }
> 
> And the scenario was like this:
> 
> * VL and Mask has 8 elements at entry to computeExtractCost.
> * NumParts is 2 (v8i32 is not legal, but v4i32 is legal).
> * NumElts is calculated as 64 (given by extractelement <64 x i32>).
> * NumSrcRegs is calculated and is set to 1 (v64i32 is legal).
> * EltsPerVector is calculated as 64 (given by NumElts/NumSrcRegs).
> * Assertion failure happens when doing ArrayRef MaskSlice = Mask.slice(Part * EltsPerVector, (Part == NumParts - 1 && Mask.size() % EltsPerVector != 0) ? Mask.size() % EltsPerVector : EltsPerVector); since EltsPerVector is larger than Mask.size() already for Part==0.
> 
> This patch resolved the issue by making sure that we slice up Mask in at most EltsPerVector pieces until we have covered the full Mask. When we have covered all elements in Mask we break the loop.
> 
> Haven't been able to reproduce this scenario for any in-tree target. So unfortunately there is no regression test included in the patch.

It is just an unexpected situation, where <8 x i32> has 2 parts, while <64 x i32> has just one. Is this correct at all? Actually, all this code must be moved to TTI and calculated there for best cost estimation, this is just a temporary solution.

https://github.com/llvm/llvm-project/pull/93188