[PATCH] D136945: [AMDGPU] Enable `permlanex16` selection with `+16-bit-insts,+gfx10-insts`

Jay Foad via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Oct 28 06:31:19 PDT 2022


foad added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp:185
 unsigned GCNSubtarget::getConstantBusLimit(unsigned Opcode) const {
+  if (hasGFX10Insts() && (Opcode == AMDGPU::V_PERMLANE16_B32_e64 ||
+                          Opcode == AMDGPU::V_PERMLANEX16_B32_e64)) {
----------------
foad wrote:
> foad wrote:
> > Pierre-vh wrote:
> > > foad wrote:
> > > > Yuck. Why is this required?
> > > Ah, this is indeed ugly and I forgot to fix it.
> > > Fixed it.
> > > 
> > > The reason why it's needed is because `legalizeOperandsVOP3` sort of assumes (and rightly so) that it'll always return 2 for permlane(x).
> > > If it returns 1, it messes up the legalization because the loop below the permlane-specific logic will undo the legalization.
> > > Also if it returns 1 for PERMLANE, verification will fail complaining about constant bus limit being exceeded, but the instruction needs 2 sgpr operands so it doesn't make sense anyway.
> > What happens if you don't change this at all?
> (Sorry I see you already answered that.)
This still makes me nervous. Are there parts of the compiler that assume this'll return 1 for pre-GFX10 instructions? Have you tried compiling a large body of code with -mcpu=gfx900 -mattr=+gfx10insts to see if the compiler crashes horribly?

Maybe it would be cleaner to special-case v_permlane in legalizeOperandsVOP3 and/or verifyInstruction.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136945/new/

https://reviews.llvm.org/D136945



More information about the llvm-commits mailing list