[PATCH] D54042: [AMDGPU] Extend the SI Load/Store optimizer to combine more things.

Fri Nov 2 13:39:51 PDT 2018

mareko added a comment.

I'm concerned that x8 and x16 loads will significantly increase SGPR usage and therefore SGPR spilling. We have a shader database with over 70 games and benchmarks and I guess the results will not be good after this is committed.

There is another case that can be optimized: Loading {f32, f32, skip, f32} and {f32, skip, f32, f32}. Those can be done with x4 loads for both scalar and vector instructions. The cost is 1 more used VGPR or SGPR. Also, register allocation may reuse the unused register immediately, which will cause unnecessary s_waitcnt after the load and may hurt us.

Repository:
  rL LLVM

https://reviews.llvm.org/D54042