[PATCH] D72620: AMDGPU/GlobalISel: Add documentation for RegisterBankInfo

Mon Jan 13 08:03:41 PST 2020

arsenm created this revision.
arsenm added reviewers: tstellar, nhaehnle, kerbowa.
Herald added subscribers: Petar.Avramovic, hiraditya, t-tye, tpr, dstuttard, rovka, yaxunl, wdng, jvesely, kzhuravl.
Herald added a project: LLVM.

Document some high level strategies that should be used for register
bank selection. The constant bus restriction section hasn't actually
been implemented yet.


https://reviews.llvm.org/D72620

Files:
  llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp


Index: llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
===================================================================

--- llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
@@ -8,7 +8,64 @@
 /// \file
 /// This file implements the targeting of the RegisterBankInfo class for
 /// AMDGPU.
-/// \todo This should be generated by TableGen.
+///
+/// \par
+///
+/// AMDGPU has unique register bank constraints that require special high level
+/// strategies to deal with. There are two main true physical register banks
+/// VGPR (vector), and SGPR (scalar). Additionally the VCC register bank is a
+/// sort of pseudo-register bank needed to represent SGPRs used in a vector
+/// boolean context. There is also the AGPR bank, which is a special purpose
+/// physical register bank present on some subtargets.
+///
+/// Copying from VGPR to SGPR is generally illegal, unless the value is known to
+/// be uniform. It is generally not valid to legalize operand by inserting
+/// copies as on other targets. Operations which require uniform, SGPR operands
+/// generally require scalarization by repeatedly executing the instruction,
+/// activating each set of lanes using a unique set of input values. This is
+/// referred to as a waterfall loop.
+///
+/// \par Booleans
+///
+/// Booleans (s1 values) requires special consideration. A vector compare result
+/// is naturally a bitmask with one bit per lane, in a 32 or 64-bit
+/// register. These are represented with the VCC bank. During selection, we need
+/// to be able to unambiguously go back from a register class to a register
+/// bank. To disambiguate determine whether an SGPR should use the SGPR or VCC
+/// register bank, we need to know the use context type. An SGPR s1 value always
+/// means a VCC bank value, otherwise it will be the SGPR bank. A scalar compare
+/// sets SCC, which is a 1-bit unaddressable register. This will need to be
+/// copied to a 32-bit virtual register. Taken together, this means we need to
+/// adjust the type of boolean operations to be regbank legal. All SALU booleans
+/// need to be widened to 32-bits, and all VALU booleans need to be s1 values.
+///
+/// A noteworthy exception to the s1-means-vcc rule is for legalization artifact
+/// casts. G_TRUNC s1 results, and G_SEXT/G_ZEXT/G_ANYEXT sources are never vcc
+/// bank. A non-boolean source (such as a truncate from a 1-bit load from
+/// memory) will require a copy to the VCC bank which will require clearing the
+/// high bits and inserting a compare.
+///
+/// \par Constant bus restriction
+///
+/// VALU instructions have a limitation known as the constant bus
+/// restriction. Most VALU instructions can use SGPR operands, but may read at
+/// most 1 SGPR or constant literal value (this to 2 in gfx10 for most
+/// instructions). This is one unique SGPR, so the same SGPR may be used for
+/// multiple operands. From a register bank perspective, any combination of
+/// operands should be legal as an SGPR, but this is contextually dependent on
+/// the SGPR operands all being the same register. There is therefore optimal to
+/// choose the SGPR with the most uses to minimize the number of copies.
+///
+/// We avoid trying to solve this problem in RegBankSelect. Any VALU G_*
+/// operation should have its source operands all mapped to VGPRs (except for
+/// VCC), inserting copies from any SGPR operands. This the most trival legal
+/// mapping. Anything beyond the simplest 1:1 instruction selection would be too
+/// complicated to solve here. Every optimization pattern or instruction
+/// selected to multiple outputs would have to enforce this rule, and there
+/// would be additional complexity in tracking this rule for every G_*
+/// operation. By forcing all inputs to VGPRs, it also simplifies the task of
+/// picking the optimal operand combination from a post-isel optimization pass.
+///
 //===----------------------------------------------------------------------===//
 
 #include "AMDGPURegisterBankInfo.h"


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D72620.237682.patch
Type: text/x-patch
Size: 4053 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20200113/ac6e0e79/attachment.bin>