[PATCH] D47511: [AMDGPU] Construct memory clauses before RA

Stanislav Mekhanoshin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed May 30 14:27:47 PDT 2018


rampitec added a comment.

In https://reviews.llvm.org/D47511#1116028, @arsenm wrote:

> Needs some tests with frame indexes, and with the d16 lo/hi operations with tied operands


There is a test with frame indexes in mir (function name "stack"). Pass skips all instructions with FI operands, so the only test needed is to check there are no clauses formed.



================
Comment at: lib/Target/AMDGPU/SIFormMemoryClauses.cpp:32-36
+// Clauses longer then 15 instructions would overflow one of the counters
+// and stall. They can stall even earlier if there are outstanding counters.
+static cl::opt<unsigned>
+MaxClause("amdgpu-max-memory-clause", cl::Hidden, cl::init(15),
+          cl::desc("Maximum length of a memory clause, instructions"));
----------------
arsenm wrote:
> I thought this was a subtarget specific limit.
> 
> I apparently never committed D40578 which made the hazard recognizer aware of the subtarget specific sizes
4 bits is a smallest counter for all ASICs (excluding exp, but we are not using exp here). On GFX9 there are two additional bits for vmcnt, but it does not make too much sense to create too big clauses, so to fit into 4 bit seems reasonable at least as a starting point.

Take in mind we cannot be precise here because if there are outstanding operations counter may be higher than zero when we come to a clause. In this case HW will break it anyway when it is saturated. Thus the value is not calculated from available counter bits or hardcoded, but rather a tunable option.


================
Comment at: lib/Target/AMDGPU/SIFormMemoryClauses.cpp:41
+class SIFormMemoryClauses : public MachineFunctionPass {
+  typedef DenseMap<unsigned, std::pair<unsigned, LaneBitmask>> RegUse;
+
----------------
arsenm wrote:
> Should this be using the BitVector of RegUnits instead?
RegUnits are for Phys. Anyway, this carries lane bitmask *and* register state, so I need two values.


================
Comment at: lib/Target/AMDGPU/SIFormMemoryClauses.cpp:312
+  MaxVGPRs = TRI->getAllocatableSet(MF, &AMDGPU::VGPR_32RegClass).count();
+  MaxSGPRs = TRI->getAllocatableSet(MF, &AMDGPU::SGPR_32RegClass).count();
+
----------------
arsenm wrote:
> SReg_32_XM0_XEXEC?
When SIRegisterInfo::getReservedRegs() works it reserves SGPR_32 for amdgpu-num-sgpr attribute. This limit accounts primarily for that attribute use (and debug reserved registers too). I also do not believe I want to use vcc or anything exotic here, I'd rather split the clause if we are running that low of registers.


================
Comment at: lib/Target/AMDGPU/SIFormMemoryClauses.cpp:363
+
+      for (auto &&R : Defs) {
+        forAllLanes(R.first, R.second.second, [&R, &B](unsigned SubReg) {
----------------
arsenm wrote:
> Why && (and for the rest)?
I could use const auto &, but forwarding reference is less verbose (and usually recommended in fact).


================
Comment at: lib/Target/AMDGPU/SIInstructions.td:548-550
+def SI_MEMORY_CLAUSE : SPseudoInstSI <
+  (outs), (ins variable_ops)
+>;
----------------
arsenm wrote:
> Why do you use this instead of using the generic TargetOpcode::BUNDLE?
> I think trying to use a custom bundle instruction is going to cause issues. For example instruction size computation for this won't work properly, and any other code that expects to see BUNDLE
Primarily to know what to remove post-RA. I do not think it creates any problems though, for instance getFunctionCodeSize() is called when memory clauses are already unbundled.


https://reviews.llvm.org/D47511





More information about the llvm-commits mailing list