[PATCH] D81172: [AMDGPU] Implement hardware bug workaround for image instructions

Tue Sep 29 19:09:28 PDT 2020

rdomingu added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:3411-3415
+      auto Unmerge = B.buildUnmerge(S16, Reg);
+      for (int I = 0, E = Unmerge->getNumOperands() - 1; I != E; ++I)
+        PackedRegs.push_back(Unmerge.getReg(I));
+      PackedRegs.resize(8, B.buildUndef(S16).getReg(0));
+      Reg = B.buildBuildVector(LLT::vector(8, S16), PackedRegs).getReg(0);
----------------
arsenm wrote:
> rdomingu wrote:
> > arsenm wrote:
> > > rdomingu wrote:
> > > > arsenm wrote:
> > > > > It would be preferable to emit a concat_vectors of <2 x s16> pieces here
> > > > Sorry, I'm new to this. Why would concat_vectors be preferable than build_vector? Could you please elaborate?
> > > Because a G_BUILD_VECTOR with 16-bit sources isn't naturally legal. This works, it just adds more work for the legalizer to reprocess these when you could produce something that's legal to begin with to save compile time
> > I see. But how would you go from v3f16 to concat_vectors of <2 x 16> to v4f32 (which is what we want at the end)?
> I think I'm missing something. Why is this going from <3 x s16> to <4 x s32>? Isn't this the unpacked layout case? Why isn't this just an G_ANYEXT from <3 x s16> to <3 x s32>?
This is the image workaround for the packed layout case. We don't want to change the data layout which is why we shouldn't use G_ANYEXT. We just want to make the compiler think the data is twice as big.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81172/new/

https://reviews.llvm.org/D81172