[PATCH] D123956: [AMDGPU] Refine 64 bit misaligned LDS ops selection

Mon Apr 18 17:02:53 PDT 2022

rampitec added a comment.

I am think this is it for LDS. 32 byte access is already fine, we do not want to split it even though that is faster on Navi, but increases register pressure.

One remaining issue which may be addressed is LoadStoreVectorization which would need a change of the definition of "fast" itself.

Another one is global isel, it would be better to handle "fast", which is technically easy, but I can only see a place for it in the legalization, which is a layering violation. So it needs a handler in the lowering instead.

Then there shall be a same handling and experiments for global. Likely global isel shall go after that as I don't think this is a generally good thing to distinguish between address spaces rather than just relying on the "allowed" and "fast" alone.

Another potential area is ignoring "fast" with optimization for size.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123956/new/

https://reviews.llvm.org/D123956