[PATCH] D69136: Add an instruction marker field to the ExtraInfo in MachineInstrs.

Reid Kleckner via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Oct 17 15:34:26 PDT 2019


rnk added inline comments.


================
Comment at: llvm/include/llvm/CodeGen/MachineFunction.h:326-327
   /// CodeView heapallocsites.
   std::vector<std::tuple<MCSymbol *, MCSymbol *, const DIType *>>
       CodeViewHeapAllocSites;
+  DenseMap<MachineInstr *, const DIType *> HeapAllocSiteInstrMap;
----------------
Now that CodeViewDebug is going to be in charge of filling in this data structure, I think it should be moved to CodeViewDebug::FunctionInfo. Actually, I see we already have them there, so we can probably remove this vector completely.


================
Comment at: llvm/include/llvm/CodeGen/MachineFunction.h:328
       CodeViewHeapAllocSites;
+  DenseMap<MachineInstr *, const DIType *> HeapAllocSiteInstrMap;
 
----------------
I don't think this map keyed on MachineInstr pointers will do the right thing in the presence of dead code elimination and code duplication.

For dead code elimination, it could have an ABA problem like the SelectionDAG nodes had before, where our solution was to clear the map after codegen. I think MachineInstr addresses can be recycled if a call is deleted and another instruction allocated at the same address.

For code duplication, are you sure this looks up the correct type? I would expect one of the instructions to not be present in here, and return a null DIType. For the `taildup-heapallocsite.ll` test case that I added, can you make sure that both S_HEAPALLOCSITE records use the same type indices?

I think you can store the DIType directly in the ExtraInfo struct and that will mean we don't need this anymore.


================
Comment at: llvm/include/llvm/CodeGen/MachineInstr.h:245
                  PointerSumTypeMember<EIIK_PostInstrSymbol, MCSymbol *>,
+                 PointerSumTypeMember<EIIK_InstrMarker, MCSymbol *>,
                  PointerSumTypeMember<EIIK_OutOfLine, ExtraInfo *>>
----------------
This label doesn't seem like it's ever emitted. It's really just a special label with the name "heapallocsite". I think you can store the DIType* here, and use that as a heap allocation call site indicator. Another option would be to store metadata references here, which would make it extensible. We could, for example, have this point to a vector of pairs of fixed metadata kinds (MD_dbg, MD_tbaa, etc) and Metadata*. It would provide a natural pathway from moving any kind of metadata annotation from an LLVM IR instruction to a MachineInstr.

Whichever attachment you choose, it should avoid the need for the extra DenseMap, and it should get copied when tail duplication clones the instruction.


================
Comment at: llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp:1072-1078
+      // If there is a marker, create and emit a label before the instruction.
+      MCSymbol *PreInstrLabel = nullptr;
+      if (MCSymbol *S = MI.getInstrMarker()) {
+        PreInstrLabel = createTempSymbol(S->getName());
+        OutStreamer->EmitLabel(PreInstrLabel);
+      }
 
----------------
I think it would be best to, as much as possible, avoid having codeview specific codepaths here.


================
Comment at: llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp:1106-1109
-      // The labels might not be defined if the instruction was replaced
-      // somewhere in the codegen pipeline.
-      if (!BeginLabel->isDefined() || !EndLabel->isDefined())
-        continue;
----------------
It'll be nice to avoid this. :)


================
Comment at: llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp:1413-1414
   bool EmptyPrologue = true;
   for (const auto &MBB : *MF) {
     for (const auto &MI : MBB) {
       if (!MI.isMetaInstruction() && !MI.getFlag(MachineInstr::FrameSetup) &&
----------------
I think you can use this code pattern here to find all the heap allocation sites, and save them on FunctionInfo for later. This beginFunctionImpl method is sort of the place where the debug info handlers get a chance to look at the entire function before each instruction is emitted. We already do big up front passes over the whole function to create the lexical scope tree, which is what places labels around block scopes.

You should be able to reuse the requestLabelBeforeInsn / requestLabelAfterInsn APIs. The label names won't be as easy to FileCheck, but there will be fewer labels in the output, which is cleaner in some ways.


================
Comment at: llvm/lib/CodeGen/MachineInstr.cpp:519
   Info.set<EIIK_OutOfLine>(
       MF.createMIExtraInfo(memoperands(), getPreInstrSymbol(), Symbol));
 }
----------------
This doesn't track the new instrmarker, so now postinstr symbols and instrmarkers can't coexist on the same instruction.


================
Comment at: llvm/lib/CodeGen/MachineInstr.cpp:549-550
+    if (memoperands_empty()) {
+      Info.set<EIIK_PreInstrSymbol>(getPreInstrSymbol());
+      Info.set<EIIK_PostInstrSymbol>(getPostInstrSymbol());
+      return;
----------------
I don't think this does the right thing. If I understand correctly, `Info` is a single pointer. The .set call overwrites the pointer, and then the second call to getPostInstrSymbol will always return null, and then the .set call will reset the pointer to null again.

This extra info code is subtle. Now that there are five possibilities, we might want to come up with some shared logic for this.

Actually, I'm surprised we didn't write unit tests for this in the first place. Oops. It's highly testable data manipulation. You can see examples in llvm/unittests/CodeGen/MachineInstrTest.cpp.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69136/new/

https://reviews.llvm.org/D69136





More information about the llvm-commits mailing list