[PATCH] D99352: [AMDGPU] ds_read_*/ds_write_* operations require strict alignment.

Stanislav Mekhanoshin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Mar 30 09:27:55 PDT 2021


rampitec added inline comments.


================
Comment at: llvm/test/CodeGen/AMDGPU/ds_read2.ll:603
+; GFX9-UNALIGNED-NEXT:  v_add_u32_e32 v1, s4, v0
+; GFX9-UNALIGNED-NEXT:  ds_read_u8 v2, v1
+; GFX9-UNALIGNED-NEXT:  ds_read_u8 v3, v1 offset:1
----------------
foad wrote:
> @rampitec didn't you say that you rely on unaligned dword reads to get good performance? So I guess this change is unacceptable.
I did, but then we've got info from @cfang about a case where dword aligned loads were causing regressions. Although according to the latest comment that were still b128 reads.

In essence the problem is that we do not always know the alignment and data may be better aligned than declared. In this case a wider load will work better than a narrower. I.e. when unaligned access is off that becomes a probability question. Note that chances to get unaligned 128 bit case are higher than unaligned 64 bit case.

That is why I have requested to perform measurements. We know for sure b128 is mostly a bad choice if we have underaligned case. I suspect b8 and b16 splits are overkills, but I am not really sure about 64b split into 32b.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99352/new/

https://reviews.llvm.org/D99352



More information about the llvm-commits mailing list