[PATCH] D54042: [AMDGPU] Extend the SI Load/Store optimizer to combine more things.
Marek Olšák via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Nov 2 13:39:51 PDT 2018
mareko added a comment.
I'm concerned that x8 and x16 loads will significantly increase SGPR usage and therefore SGPR spilling. We have a shader database with over 70 games and benchmarks and I guess the results will not be good after this is committed.
There is another case that can be optimized: Loading {f32, f32, skip, f32} and {f32, skip, f32, f32}. Those can be done with x4 loads for both scalar and vector instructions. The cost is 1 more used VGPR or SGPR. Also, register allocation may reuse the unused register immediately, which will cause unnecessary s_waitcnt after the load and may hurt us.
Repository:
rL LLVM
https://reviews.llvm.org/D54042
More information about the llvm-commits
mailing list