[llvm] [SLPVectorizer][NVPTX] Customize getBuildVectorCost for NVPTX (PR #128077)

Artem Belevich via llvm-commits llvm-commits at lists.llvm.org
Fri Feb 21 10:33:32 PST 2025


Artem-B wrote:

> PTX has a single mov instruction that can build e.g. <2 x half> vectors from scalars, however the SLPVectorizer over-estimates it as the cost of 2 insert elements.

Single instruction in PTX does not mean that it's efficiently implemented in hardware. 

In this particular case, `mov.b32 %r1, {%h1, %h2}` takes three (different) instructions on GPU:
https://godbolt.org/z/1jWfsjfrz

I think that the current estimate that construction of a v2f16 vector costs us an equivalent of few logical ops is quite reasonable.

https://github.com/llvm/llvm-project/pull/128077


More information about the llvm-commits mailing list