[llvm] [AMDGPU] Eliminate unnecessary packing in wider f16 vectors for sdwa/opsel-able instruction (PR #137137)
Vikash Gupta via llvm-commits
llvm-commits at lists.llvm.org
Wed Aug 6 23:54:06 PDT 2025
vg0204 wrote:
So, finally I cam to conclusion of moving my patch as a separate new pass immediately after si-peephol-sdwa for following reasons.
1. It could not be treated as a peephole optimization because of the way its implemented that do rigorous sort of conditions (across use-def chains) test to look for scenario whose transformation would be profiatable.
2. The use-case & coverage of optimization via my patch, dominates the performance improvement over the increased cost of dealing with a new pass in pipeline.
3. It is certainly possible to break this implemenation as a series of peephole optimization patterns (as suggested by @frederik-h ), but I am doubtful about it handling all but not few of generic scenarios as listed in my testCase file.
>
>I do think solving the original problem - that is, less register-efficient lowerings of SWDA/OPSEL-able operations that're being >run on a vector <4 x [i//f]16> or the like - should be done
This is another possible approach (suggested by @krzysz00) to tackle the problem at its source itself. So, @arsenm @jayfoad @Pierre-vh @frederik-h , & @krzysz00 What seems better way to go with it!
https://github.com/llvm/llvm-project/pull/137137
More information about the llvm-commits
mailing list