[PATCH] D80033: [AMDGPU] Fix wait counts in the presence of 16bit subregisters

Tue May 26 09:44:41 PDT 2020

rampitec added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:508
   unsigned Size = TRI->getRegSizeInBits(*RC);
-  Result.second = Result.first + (Size / 32);
+  Result.second = Result.first + ((Size + 16) / 32);

----------------
foad wrote:
> rampitec wrote:
> > foad wrote:
> > > I think there's a DivCeil function in MathExtras that you could use here.
> > Please don't. This function does not work. I had to ditch it the other day specifically because of 16 vs 32 bits.
> > This function does not work.
> 
> ???
Found it: D78772
It was: "divideCeil(getSubRegIdxOffset(SubReg), 32)" changed to :(getSubRegIdxOffset(SubReg) + 31) / 32".
The problem was with offset == 16. If we operate just size this is probably OK.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80033/new/

https://reviews.llvm.org/D80033