[PATCH] D136945: [AMDGPU] Enable `permlanex16` selection with `+16-bit-insts,+gfx10-insts`
Pierre van Houtryve via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Oct 28 06:23:22 PDT 2022
Pierre-vh added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp:185
unsigned GCNSubtarget::getConstantBusLimit(unsigned Opcode) const {
+ if (hasGFX10Insts() && (Opcode == AMDGPU::V_PERMLANE16_B32_e64 ||
+ Opcode == AMDGPU::V_PERMLANEX16_B32_e64)) {
----------------
foad wrote:
> Yuck. Why is this required?
Ah, this is indeed ugly and I forgot to fix it.
Fixed it.
The reason why it's needed is because `legalizeOperandsVOP3` sort of assumes (and rightly so) that it'll always return 2 for permlane(x).
If it returns 1, it messes up the legalization because the loop below the permlane-specific logic will undo the legalization.
Also if it returns 1 for PERMLANE, verification will fail complaining about constant bus limit being exceeded, but the instruction needs 2 sgpr operands so it doesn't make sense anyway.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D136945/new/
https://reviews.llvm.org/D136945
More information about the llvm-commits
mailing list