[PATCH] D99352: [AMDGPU] ds_read_*/ds_write_* operations require strict alignment.

Changpeng Fang via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Mar 29 16:04:06 PDT 2021


cfang added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:1435
+      // ds_read/write_b128 require 16-byte alignment on gfx8 and older.
+      bool Aligned = Alignment >= Align(16);
       if (IsFast)
----------------
arsenm wrote:
> rampitec wrote:
> > arsenm wrote:
> > > hsmhsm wrote:
> > > > arsenm wrote:
> > > > > I don't like changing this function to lie and say it doesn't work when it does
> > > > My understanding is that the function allowsMisalignedMemoryAccessesImpl() should check for the strict alignment requirment for ds_read/ds_write operations. I am now further updated this function accordingly.
> > > You are making this stricter than the instructions really allow
> > That is more or less the point. I think there is some balance here between the perf drop when a wide access is used for misaligned data and perf loss when a narrow access is used on data for which we cannot prove alignment but it just happens to be aligned.
> > 
> > This seems to warrant a perf testing on this patch. @cfang does this solve perf regression you were working on?
> The function reports if it's legal and works. IsFast is separate and can change, but it should still report this as a legal unaligned access
Verified with SHOC SGEMM, this patch could resolve the regression.
For single precision (align4), ds_read2_b32 are generated;
For double precision (align8), ds_read2_b64 are generated.
Both are much better than unaligned ds_read_b128 (which was the reason of regression).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99352/new/

https://reviews.llvm.org/D99352



More information about the llvm-commits mailing list