[llvm] [SLPVectorizer][NVPTX] Customize getBuildVectorCost for NVPTX (PR #128077)
Artem Belevich via llvm-commits
llvm-commits at lists.llvm.org
Fri Feb 21 10:33:32 PST 2025
Artem-B wrote:
> PTX has a single mov instruction that can build e.g. <2 x half> vectors from scalars, however the SLPVectorizer over-estimates it as the cost of 2 insert elements.
Single instruction in PTX does not mean that it's efficiently implemented in hardware.
In this particular case, `mov.b32 %r1, {%h1, %h2}` takes three (different) instructions on GPU:
https://godbolt.org/z/1jWfsjfrz
I think that the current estimate that construction of a v2f16 vector costs us an equivalent of few logical ops is quite reasonable.
https://github.com/llvm/llvm-project/pull/128077
More information about the llvm-commits
mailing list