[PATCH] D99352: [AMDGPU] ds_read_*/ds_write_* operations require strict alignment.

Stanislav Mekhanoshin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Mar 29 16:06:38 PDT 2021


rampitec added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:1435
+      // ds_read/write_b128 require 16-byte alignment on gfx8 and older.
+      bool Aligned = Alignment >= Align(16);
       if (IsFast)
----------------
cfang wrote:
> arsenm wrote:
> > rampitec wrote:
> > > arsenm wrote:
> > > > hsmhsm wrote:
> > > > > arsenm wrote:
> > > > > > I don't like changing this function to lie and say it doesn't work when it does
> > > > > My understanding is that the function allowsMisalignedMemoryAccessesImpl() should check for the strict alignment requirment for ds_read/ds_write operations. I am now further updated this function accordingly.
> > > > You are making this stricter than the instructions really allow
> > > That is more or less the point. I think there is some balance here between the perf drop when a wide access is used for misaligned data and perf loss when a narrow access is used on data for which we cannot prove alignment but it just happens to be aligned.
> > > 
> > > This seems to warrant a perf testing on this patch. @cfang does this solve perf regression you were working on?
> > The function reports if it's legal and works. IsFast is separate and can change, but it should still report this as a legal unaligned access
> Verified with SHOC SGEMM, this patch could resolve the regression.
> For single precision (align4), ds_read2_b32 are generated;
> For double precision (align8), ds_read2_b64 are generated.
> Both are much better than unaligned ds_read_b128 (which was the reason of regression).
Ugh! Do you know if we can use ds_read2_b64 for align4? The main question here do we need to split b128 only or everything misaligned?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99352/new/

https://reviews.llvm.org/D99352



More information about the llvm-commits mailing list