[PATCH] [MC] Disassembled CFG reconstruction

Mon May 20 16:49:12 PDT 2013

  Hi Ahmed,

  This is looking promising!
  Thanks for working on this.

  I have a couple of concerns that need to be addressed, though.

  Minor Concerns:
  -------------------
  1. For newly added doxygen, please stick to the CodingStandards:
  http://llvm.org/docs/CodingStandards.html#doxygen-use-in-documentation-comments
  2. Few typos and stylistic thing. See inlined comments.

  Major Concerns:
  -------------------
  1. Verbose information should be printed in dbgs() and in DEBUG() macro.
  2. In MCModule::buildCFG, the dynamically allocated memory has to be released by the client whereas it is not documented. Actually, I would prefer that MCModule holds the MCFunctions (see inlined comment for line 335 of llvm-objdump.cpp).
  3. Textual output loses the printing of relocation (DumpAddress).

  Thanks,

  -Quentin


================
Comment at: include/llvm/MC/MCAtom.h:41
@@ +40,3 @@
+  /// split - Splits the atom in two at a given address, which must align with
+  /// and instruction boundary if this is a TextAtom.  Returns the newly created
+  /// atom representing the high part of the split.
----------------
Typo: and -> an

================
Comment at: include/llvm/MC/MCAtom.h:69
@@ -43,4 +68,3 @@
 
-  // Private constructor - only callable by MCModule
-  MCAtom(AtomType T, MCModule *P, uint64_t B, uint64_t E)
-    : Type(T), Parent(P), Begin(B), End(E) { }
+  /// remap - Remap the atom, using the new range, updating Begin/End.
+  void remap(uint64_t NewBegin, uint64_t NewEnd);
----------------
Would it make sense to be able to update just one of the bounds?

================
Comment at: include/llvm/MC/MCAtom.h:77
@@ +76,3 @@
+  /// The bounds for the resulting atoms are returned in {L,R}{Begin,End}.
+  void remapForSplit(uint64_t SplitPt,
+                     uint64_t &LBegin, uint64_t &LEnd,
----------------
What happens to the current Atom in that case?
Is it modified?
If yes, how?
If not, should we set a const keyword here?

================
Comment at: include/llvm/MC/MCAtom.h:101
@@ -47,1 +100,3 @@
 
+  /// NextInstAddress - The address of the next appended instruction, i.e. the
+  /// address immediately after the last instruction in the atom.
----------------
Typo: i.e. -> i.e.,

================
Comment at: include/llvm/MC/MCAtom.h:108
@@ +107,3 @@
+
+  typedef InstListTy::const_iterator iterator;
+  iterator begin() const { return Insts.begin(); }
----------------
Please explicit the “constness” in the name of the type.
In other words: iterator -> const_iterator

Otherwise, it is confusing to see this kind of API:
iterator begin() const;

It seems the method returns a not const iterator on a const object!

================
Comment at: include/llvm/MC/MCModule.h:32
@@ +31,3 @@
+  /// Atoms - Atoms in this module, sorted by begin address.
+  /// XXX: For now, this doesn't handle overlapping atoms (which can for
+  /// instance happen when a basic block starts in the middle of an instruction
----------------
I guess the standard way to annotate that is to use a FIXME.

================
Comment at: include/llvm/MC/MCModule.h:40
@@ -45,3 +39,3 @@
   /// remap - Update the interval mapping for an MCAtom.
   void remap(MCAtom *Atom, uint64_t NewBegin, uint64_t NewEnd);
 
----------------
What are the constraints on Atom argument?
Should it be part of Atoms?
If so, explicit that in the comment.

================
Comment at: include/llvm/MC/MCModule.h:42
@@ -47,1 +41,3 @@
 
+  // map - Insert a new MCAtom in the interval map/allocation tracker.
+  void map(MCAtom *NewAtom);
----------------
Based on the comment, it is not clear to me what this method is doing.
Indeed, both OffsetMap and AtomAllocationTracker are gone.

================
Comment at: include/llvm/MC/MCModule.h:50
@@ -51,2 +49,3 @@
   /// createAtom - Creates a new MCAtom covering the specified offset range.
-  MCAtom *createAtom(MCAtom::AtomType Type, uint64_t Begin, uint64_t End);
+  MCTextAtom *createTextAtom(uint64_t Begin, uint64_t End);
+  MCDataAtom *createDataAtom(uint64_t Begin, uint64_t End);
----------------
Do you think you can stick to the old API here.
Hence, could you avoid having two different methods?
This will help to have some stability in the API if we add new types.

================
Comment at: lib/MC/MCFunction.cpp:52
@@ +51,3 @@
+bool MCBasicBlock::isPredecessor(const MCBasicBlock *MCBB) const {
+  return std::count(Predecessors.begin(), Predecessors.end(), MCBB);
+}
----------------
Predecessors.end() != std::find should be more efficient here.

================
Comment at: lib/MC/MCFunction.cpp:44
@@ +43,3 @@
+bool MCBasicBlock::isSuccessor(const MCBasicBlock *MCBB) const {
+  return std::count(Successors.begin(), Successors.end(), MCBB);
+}
----------------
Successors.end() != std::find should be more efficient here.

================
Comment at: lib/MC/MCObjectDisassembler.cpp:79
@@ +78,3 @@
+      Data->setName(SecName);
+      for (uint64_t Index = 0; Index < SecSize; Index += 1)
+        Data->addData(Contents[Index]);
----------------
Typo: Index += 1 -> ++Index.
http://llvm.org/docs/CodingStandards.html#prefer-preincrement

================
Comment at: lib/MC/MCObjectDisassembler.cpp:116
@@ +115,3 @@
+    Calls.insert(TA->getBeginAddr());
+    errs() << "Building CFG for atom " << TA->getName() << "\n";
+    for (MCTextAtom::iterator II = TA->begin(), IE = TA->end();
----------------
Verbose information should be printed in dbgs() and in DEBUG() macro.

================
Comment at: lib/MC/MCObjectDisassembler.cpp:114
@@ +113,3 @@
+    MCTextAtom *TA = dyn_cast<MCTextAtom>(*AI);
+    if (TA == 0) continue;
+    Calls.insert(TA->getBeginAddr());
----------------
TA == 0 -> !TA

I am not sure it is in the coding standard, but it looks odd to use == 0 to me.

================
Comment at: lib/MC/MCObjectDisassembler.cpp:134
@@ +133,3 @@
+    MCAtom *A = Module->find(*SI);
+    if (A == 0) continue;
+    MCTextAtom *TA = cast<MCTextAtom>(A);
----------------
!A

================
Comment at: lib/MC/MCObjectDisassembler.cpp:150
@@ +149,3 @@
+    MCTextAtom *TA = dyn_cast<MCTextAtom>(*AI);
+    if (TA == 0) continue;
+    BBInfo &CurBB = BBInfos[TA->getBeginAddr()];
----------------
!TA

================
Comment at: lib/MC/MCObjectDisassembler.cpp:168
@@ +167,3 @@
+    BBInfo &BBI = BBInfos[*CI];
+    if (BBI.Atom == 0) continue;
+
----------------
!BBI.Atom

================
Comment at: tools/llvm-objdump/llvm-objdump.cpp:335
@@ +334,3 @@
+                    **FI, IP.get());
+      delete *FI;
+    }
----------------
Accordingly to the implementation, this delete is required to free the memory properly.
However, the code looks odd to me, because the call to the new counter part is hidden and more importantly *not documented*.

You need to find a better way to handle that otherwise, it is certain that the client of this API will forgot to free the memory.

One solution could be to store the MCFunctions into the MCModule and make the MCModule dispose them. Then the MCModule would have an iterator over the functions instead of filling a structure passed to buildCFG.
If we want, we can privatize the buildCFG method and call it lazily when the iterator is accessed.

Anyway, having the MCFunctions being a part of a MCModule follow the current IR design and is, IMO, better.

================
Comment at: tools/llvm-objdump/llvm-objdump.cpp:318
@@ +317,3 @@
+         AI != AE; ++AI) {
+      errs() << "Atom: " << (*AI)->getName() << "\n";
+      if (MCTextAtom *TA = dyn_cast<MCTextAtom>(*AI)) {
----------------
Verbose information in dbgs() and in macro DEBUG().

================
Comment at: tools/llvm-objdump/llvm-objdump.cpp:323
@@ +322,3 @@
+             ++II) {
+          IP->printInst(&II->Inst, errs(), "");
+          errs() << "\n";
----------------
It seems to me that we lose the ability to print relocation (DumpAddress).
You must keep this ability.


http://llvm-reviews.chandlerc.com/D811

BRANCH
  mcfunc

ARCANIST PROJECT
  llvm