[PATCH] D70210: [MirNamer][Canonicalizer]: Perform instruction semantic based renaming

Thu Nov 14 18:09:22 PST 2019

bogner accepted this revision.
bogner added a comment.

LGTM once Puyan's feedback is addressed

================
Comment at: llvm/lib/CodeGen/MIRVRegNamerUtils.cpp:84-88
+    // Use this as catch all for every other case.
+    // In reality, we could replace this with an assert,
+    // but since this only affects the hash, it should be
+    // okay.
+    return 0; // TODO.
----------------
As far as I understand this can't really be replaced with an assert as is, since things like basic block ids are being deliberately dropped here. Can you explicitly handle the cases that we really want to drop from the hash or at least update this comment to explain? I realize it might make sense to always return something, since the only effect is more collisions, but the comment reads as if you think it's an error to get here.

================
Comment at: llvm/lib/CodeGen/MIRVRegNamerUtils.cpp:105
+bool NamedVRegCursor::renameInstsInMBB(MachineBasicBlock *MBB) {
+  std::vector<TypedVReg> VRegs;
+  std::string Prefix = "bb" + std::to_string(getCurrentBBNumber()) + "_";
----------------
aditya_nandakumar wrote:
> plotfi wrote:
> > I like this linear approach but I might like to keep the tree based approach as well as a toggle until we can add more tests. In the tree based approach I was trying to do the canonicalization based on the chain of operations that flow into a side effect, where here the side effects are renaming barriers? On second thought, I really like the hashing approach on the VReg-Def opcode and if you are confident it wont result in too may cases where a difference that should have remained is lost, I'd be fine with replacing all of this walking business. 
> > 
> > Comments (and perhaps some brief MIR snippets) on how this renaming mechanism works would be really nice to have as well.
> > 
> > From what I understand, you have many test cases downstream. Can these be ported to aarch64 to bolster the testing upstream? Even the tests with 7 and 8 instructions can be useful, and I'd assume shouldn't be too difficult to port to a supported downstream target? Does this sound reasonable to you @aditya_nandakumar @bogner ??
> In general hash collisions are resolved similarly on both sides of a diff and the hash collision renaming will also happen similarly on both sides of the diff (just incrementing numbers). This will line up really well for diffs. Additionally with the hashing method, just when you're staring at two equivalent pieces of code, just by looking at the reg name that is a hash, you can just assume that they're likely equivalent instructions and move your focus elsewhere.
> In general, due to the disadvantages of the previous algorithm and the advantages of this approach, there should be no need to keep both approaches.
> Regarding tests, there's sufficient coverage. The core algorithm is simple - hash instruction oeprands and rename based on the hash(which capture the semantics of the instruction). It's evident that this works correctly (look at MIR/AArch64/mirCanonIdempotent.mir included in this patch where two of the same MOV instructions are renamed correctly)
> ```
> # CHECK-NEXT: %bb0_42274__1:gpr32 = MOVi32imm 408
> # CHECK-NEXT: %bb0_42274__2:gpr32 = MOVi32imm 408
> ```
> The only thing adding more tests would help with is we tie that up some diffing tool and make sure that the core strategy still works. Otherwise, it will be quite identical to the ones we already have here. I'll still try to come up with some screenshots of the two approaches and attach it to the review.
This is still doing the traversal in a sense, just a bit more implicitly while we calculate the instruction hash. The nice thing here is that the hash is stable enough that it doesn't matter where we start from, so we can just walk the instructions linearly and it ends up looking a bit simpler.

We don't really have many interestingly distinct test cases downstream - the test cases below cover the functional testing pretty well. What we do have is some experience using this on large functions that were compiled with and without GlobalISel, which have shown that this hashing approach helps quite a bit.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70210/new/

https://reviews.llvm.org/D70210