[PATCH] D136945: [AMDGPU] Enable `permlanex16` selection with `+16-bit-insts,+gfx10-insts`

Matt Arsenault via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Oct 28 08:57:35 PDT 2022


arsenm added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:557
       .lower();
-  } else if (ST.has16BitInsts()) {
+  } else if (ST.has16BitInsts() && ST.hasMad64_32()) {
     getActionDefinitionsBuilder({G_ADD, G_SUB})
----------------
Nothing here requires or implies a need for mad64_32?


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp:185
 unsigned GCNSubtarget::getConstantBusLimit(unsigned Opcode) const {
+  if (hasGFX10Insts() && (Opcode == AMDGPU::V_PERMLANE16_B32_e64 ||
+                          Opcode == AMDGPU::V_PERMLANEX16_B32_e64)) {
----------------
foad wrote:
> foad wrote:
> > foad wrote:
> > > Pierre-vh wrote:
> > > > foad wrote:
> > > > > Yuck. Why is this required?
> > > > Ah, this is indeed ugly and I forgot to fix it.
> > > > Fixed it.
> > > > 
> > > > The reason why it's needed is because `legalizeOperandsVOP3` sort of assumes (and rightly so) that it'll always return 2 for permlane(x).
> > > > If it returns 1, it messes up the legalization because the loop below the permlane-specific logic will undo the legalization.
> > > > Also if it returns 1 for PERMLANE, verification will fail complaining about constant bus limit being exceeded, but the instruction needs 2 sgpr operands so it doesn't make sense anyway.
> > > What happens if you don't change this at all?
> > (Sorry I see you already answered that.)
> This still makes me nervous. Are there parts of the compiler that assume this'll return 1 for pre-GFX10 instructions? Have you tried compiling a large body of code with -mcpu=gfx900 -mattr=+gfx10insts to see if the compiler crashes horribly?
> 
> Maybe it would be cleaner to special-case v_permlane in legalizeOperandsVOP3 and/or verifyInstruction.
I'd also rather not touch this.

This is another case where I feel we would be better off directly emitting subtarget specific instructions and reading the property out of the instruction properties 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136945/new/

https://reviews.llvm.org/D136945



More information about the llvm-commits mailing list