[PATCH] D50633: [AMDGPU] Add new Mode Register pass

Thu Nov 15 10:54:42 PST 2018

nhaehnle added a comment.

The update seems to have messed up the indentation of comments in a few places.

================
Comment at: lib/Target/AMDGPU/SIModeRegister.cpp:237-257
+      } else if (InsertionPoint) {
+        // We need a setreg at the InsertionPoint, but we defer the first in
+        // each block to Phase 3
+        if (RequirePending) {
+          NewInfo->Require = NewInfo->Change;
+          RequirePending = false;
+        } else {
----------------
timcorringham wrote:
> nhaehnle wrote:
> > I feel like this logic is more convoluted than necessary, and possibly even wrong / overly conservative in some cases because of that.
> > 
> > For example, why are you setting InsertionPoint also in the case where FirstInsertionPoint is set? Why are IPChange and NewInfo->Change never cleared? My intuition is that the logic should just be a delayed-update/insert pattern like this:
> > ```
> > if (!NewInfo->FirstInsertionPoint) {
> >   NewInfo->FirstInsertionPoint = &MI;
> > } else if (!IPChange.isCompatible(InstrMode)) {
> >   if (InsertionPoint)
> >     insertSetReg(IPChange);
> >   else
> >     NewInfo->Require = IPChange;
> >   merge IPChange into NewInfo->Change and reset IPChange
> >   InsertionPoint = &MI;
> > }
> > merge InstrMode into IPChange
> > ```
> > Then at the end of the basic block:
> > ```
> > if (!InsertionPoint)
> >   NewInfo->Require = IPChange;
> > ```
> > The RequirePending flag should simply be unnecessary.
> There are cases where we don't set FirstInsertionPoint, even if there are other InsertionPoints. This can arise where an explicit setreg appears before the first instruction that uses the mode register. We  preserve the setreg (we don't really expect to see any, but if they appear we assume there is a good reason for it). In that case there is no initial mode value requirement, so no FirstInsertionPoint.  That is also the reason for the RequirePending flag - there are more states than can be deduced by just the InsertionPoint and FirstInsertionPoint pointers.
> We don't clear Change as the algorithm assumes it holds the net change to the mode by the block. When we know the predecessor mode(s) in Phase 2 we can then determine the output mode of each block (this can involve revisiting blocks that are successors to any block that changes its output mode).
> Phase 3 then determines whether a setreg is required at the FirstInsertionPoiint.
Okay, I see the point about having a pre-existing setreg before any instruction with mode requirements.

However, the point about clearing Change (or rather IPChange) still stands, because there are many different mode bits that could have requirements separately. For example, you could have:
```
1. Inst that has f32 round & denormal requirements
2. Inst that has f16 round & denormal requirements
3. Inst that has different f32 round & denormal req.s
4. Inst that has different f16 round & denormal req.s
```
You really only need to insert two setregs (before 1 and 3), but the algorithm will insert setregs before 1, 3, and 4.

================
Comment at: lib/Target/AMDGPU/SIModeRegister.cpp:226
+                                        const SIInstrInfo *TII) {
+  std::unique_ptr<BlockData> NewInfo (new BlockData);
+  MachineInstr *InsertionPoint = nullptr;
----------------
auto NewInfo = llvm::make_unique<BlockData>();

Repository:
  rL LLVM

https://reviews.llvm.org/D50633