peterbell10 wrote: @Artem-B I don't follow your point. Yes, `ptxas` can optimize the insert-element sequence itself, but that doesn't have any effect on the slp-vectorizer's cost heuristics. https://github.com/llvm/llvm-project/pull/128077