[PATCH] D41102: Setup clang-doc frontend framework

Roman Lebedev via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Feb 23 09:33:57 PST 2018


lebedev.ri added a comment.

Next, i suggest to look into code self-debugging, see comments.
Also, i have added a few questions, it would be great to know that my understanding is correct?

I'm sorry that it seems like we are going over and over and over over the same code again, 
this is the very base of the tool, i think it is important to get it as close to great as possible.
I *think* these review comments move it in that direction, not in the opposite direction?



================
Comment at: clang-doc/BitcodeWriter.cpp:47
+      BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed,
+                      BitCodeConstants::LineNumFixedSize));  // Line number
+  Abbrev->Add(
----------------
So in other words this is making an assumption that no file with more than 65535 lines will be analyzed, correct?
Can you add that as comment please?


================
Comment at: clang-doc/BitcodeWriter.cpp:56
+  StringRef Name;
+  AbbrevDsc Abbrev;
+
----------------
```
  AbbrevDsc Abbrev = nullptr;
```


================
Comment at: clang-doc/BitcodeWriter.cpp:57
+  AbbrevDsc Abbrev;
+
+  RecordIdDsc() = default;
----------------
```
// Is this 'description' valid?
operator bool() const {
  return Abbrev != nullptr && Name.data() != nullptr && !Name.empty();
}
```


================
Comment at: clang-doc/BitcodeWriter.cpp:137
+          {FUNCTION_LOCATION, {"Location", &LocationAbbrev}},
+          {FUNCTION_PARENT, {"Parent", &StringAbbrev}},
+          {FUNCTION_ACCESS, {"Access", &IntAbbrev}}};
----------------
So `FUNCTION_MANGLED_NAME` is phased out, and is thus missing, as far as i understand?


================
Comment at: clang-doc/BitcodeWriter.cpp:148
+void ClangDocBitcodeWriter::AbbreviationMap::add(RecordId RID,
+                                                 unsigned AbbrevID) {
+  assert(Abbrevs.find(RID) == Abbrevs.end() && "Abbreviation already added.");
----------------
+`assert(RecordIdNameMap[ID] && "Unknown Abbreviation");`


================
Comment at: clang-doc/BitcodeWriter.cpp:153
+
+unsigned ClangDocBitcodeWriter::AbbreviationMap::get(RecordId RID) const {
+  assert(Abbrevs.find(RID) != Abbrevs.end() && "Unknown abbreviation.");
----------------
+`assert(RecordIdNameMap[ID] && "Unknown Abbreviation");`


================
Comment at: clang-doc/BitcodeWriter.cpp:158
+
+void ClangDocBitcodeWriter::AbbreviationMap::clear() { Abbrevs.clear(); }
+
----------------
Called only once, and that call does nothing.
I'd drop it.


================
Comment at: clang-doc/BitcodeWriter.cpp:175
+/// \brief Emits a block ID and the block name to the BLOCKINFO block.
+void ClangDocBitcodeWriter::emitBlockID(BlockId ID) {
+  Record.clear();
----------------
```
/// \brief Emits a block ID and the block name to the BLOCKINFO block.
void ClangDocBitcodeWriter::emitBlockID(BlockId ID) {
  const auto& BlockIdName = BlockIdNameMap[ID];
  assert(BlockIdName.data() && BlockIdName.size() && "Unknown BlockId!");

  Record.clear();
  Record.push_back(ID);
  Stream.EmitRecord(llvm::bitc::BLOCKINFO_CODE_SETBID, Record);

  Record.clear();
  for (const char C : BlockIdName) Record.push_back(C);
  Stream.EmitRecord(llvm::bitc::BLOCKINFO_CODE_BLOCKNAME, Record);
}
```


================
Comment at: clang-doc/BitcodeWriter.cpp:187
+void ClangDocBitcodeWriter::emitRecordID(RecordId ID) {
+  prepRecordData(ID);
+  for (const char C : RecordIdNameMap[ID].Name) Record.push_back(C);
----------------
```
/// \brief Emits a record name to the BLOCKINFO block.
void ClangDocBitcodeWriter::emitRecordID(RecordId ID) {
  assert(RecordIdNameMap[ID] && "Unknown Abbreviation");
  prepRecordData(ID);
```
(Yes, `prepRecordData()` will have the same code. It should get optimized away.)


================
Comment at: clang-doc/BitcodeWriter.cpp:194
+
+void ClangDocBitcodeWriter::emitAbbrev(RecordId ID, BlockId Block) {
+  auto Abbrev = std::make_shared<BitCodeAbbrev>();
----------------
```
void ClangDocBitcodeWriter::emitAbbrev(RecordId ID, BlockId Block) {
  assert(RecordIdNameMap[ID] && "Unknown Abbreviation");
  auto Abbrev = std::make_shared<BitCodeAbbrev>();
```


================
Comment at: clang-doc/BitcodeWriter.cpp:204
+void ClangDocBitcodeWriter::emitRecord(StringRef Str, RecordId ID) {
+  if (!prepRecordData(ID, !Str.empty())) return;
+  Record.push_back(Str.size());
----------------
So remember that in a previous iteration, seemingly useless `AbbrevDsc` stuff was added to the `RecordIdNameMap`?
It is going to pay-off now:
```
void ClangDocBitcodeWriter::emitRecord(StringRef Str, RecordId ID) {
  assert(RecordIdNameMap[ID] && "Unknown Abbreviation");
  assert(RecordIdNameMap[ID].Abbrev == &StringAbbrev && "Abbrev type mismatch");
  if (!prepRecordData(ID, !Str.empty())) return;
...
```
And if we did not add an `RecordIdNameMap` entry for this `RecordId`, then i believe that will also be detected because `Abbrev` will be a `nullptr`.


================
Comment at: clang-doc/BitcodeWriter.cpp:205
+  if (!prepRecordData(ID, !Str.empty())) return;
+  Record.push_back(Str.size());
+  Stream.EmitRecordWithBlob(Abbrevs.get(ID), Record, Str);
----------------
```
assert(Str.size() < (1U << BitCodeConstants::StringLengthSize));
Record.push_back(Str.size());
```


================
Comment at: clang-doc/BitcodeWriter.cpp:210
+void ClangDocBitcodeWriter::emitRecord(const Location &Loc, RecordId ID) {
+  if (!prepRecordData(ID, !OmitFilenames)) return;
+  Record.push_back(Loc.LineNumber);
----------------
```
void ClangDocBitcodeWriter::emitRecord(const Location &Loc, RecordId ID) {
  assert(RecordIdNameMap[ID] && "Unknown Abbreviation");
  assert(RecordIdNameMap[ID].Abbrev == &LocationAbbrev && "Abbrev type mismatch");
  if (!prepRecordData(ID, !OmitFilenames)) return;
...
```


================
Comment at: clang-doc/BitcodeWriter.cpp:211
+  if (!prepRecordData(ID, !OmitFilenames)) return;
+  Record.push_back(Loc.LineNumber);
+  Record.push_back(Loc.Filename.size());
----------------
Call me paranoid, but:
```
assert(Loc.LineNumber < (1U << BitCodeConstants::LineNumberSize));
Record.push_back(Loc.LineNumber);
assert(Loc.Filename.size()) < (1U << BitCodeConstants::StringLengthSize));
Record.push_back(Loc.Filename.size());
```


================
Comment at: clang-doc/BitcodeWriter.cpp:217
+void ClangDocBitcodeWriter::emitRecord(int Val, RecordId ID) {
+  if (!prepRecordData(ID, Val)) return;
+  Record.push_back(Val);
----------------
```
void ClangDocBitcodeWriter::emitRecord(int Val, RecordId ID) {
  assert(RecordIdNameMap[ID] && "Unknown Abbreviation");
  assert(RecordIdNameMap[ID].Abbrev == &IntAbbrev && "Abbrev type mismatch");
  if (!prepRecordData(ID, Val)) return;
```


================
Comment at: clang-doc/BitcodeWriter.cpp:218
+  if (!prepRecordData(ID, Val)) return;
+  Record.push_back(Val);
+  Stream.EmitRecordWithAbbrev(Abbrevs.get(ID), Record);
----------------
```
assert(Val < (1U << BitCodeConstants::IntSize));
Record.push_back(Val);
```


================
Comment at: clang-doc/BitcodeWriter.cpp:222
+
+bool ClangDocBitcodeWriter::prepRecordData(RecordId ID, bool ShouldEmit) {
+  if (!ShouldEmit) return false;
----------------
```
bool ClangDocBitcodeWriter::prepRecordData(RecordId ID, bool ShouldEmit) {
  assert(RecordIdNameMap[ID] && "Unknown Abbreviation");
  if (!ShouldEmit) return false;
```


================
Comment at: clang-doc/BitcodeWriter.cpp:232
+void ClangDocBitcodeWriter::emitBlockInfoBlock() {
+  Abbrevs.clear();
+  emitHeader();
----------------
Since `ClangDocBitcodeWriter` is not re-used, but re-constructed* each time, `Abbrevs.clear();` does nothing.

* Hmm, i wonder if that will be a bad thing. Benchmarking will tell i guess :/


================
Comment at: clang-doc/BitcodeWriter.cpp:236
+
+  std::initializer_list<std::pair<BlockId, std::initializer_list<RecordId>>>
+      TheBlocks{// Version Block
----------------
https://godbolt.org/g/rD6BWK also suggests it should be `static const`


================
Comment at: clang-doc/BitcodeWriter.cpp:276
+void ClangDocBitcodeWriter::emitBlockInfo(BlockId BID,
+                                          const std::vector<RecordId> RIDs) {
+  emitBlockID(BID);
----------------
Uhm, do you plan on calling `emitBlockInfo()` from anywhere else other than `emitBlockInfoBlock()`?
Since it takes `const std::vector<RecordId>` instead of a `const std::initializer_list<RecordId>&`, a memory copy will happen...
https://godbolt.org/g/rD6BWK


================
Comment at: clang-doc/BitcodeWriter.h:35
+
+struct BitCodeConstants {
+  static constexpr int SubblockIDSize = 5;
----------------
`LineNumFixedSize` is used for a different things. Given such a specific name, i think it may be confusing?
Also, looking at http://llvm.org/doxygen/classllvm_1_1BitstreamWriter.html#ae6a40b4a5ea89bb8b5076c26e0d0b638 i guess these all should be `unsigned`.

I think this would be better, albeit more verbose:
```
struct BitCodeConstants {
  static constexpr unsigned SignatureBitSize = 8U;
  static constexpr unsigned SubblockIDSize = 5U;
  static constexpr unsigned IntSize = 16U;
  static constexpr unsigned StringLengthSize = 16U;
  static constexpr unsigned LineNumberSize = 16U;
};
```


================
Comment at: clang-doc/BitcodeWriter.h:53
+  BI_LAST = BI_COMMENT_BLOCK_ID
+};
+
----------------
So what *exactly* does `BitCodeConstants::SubblockIDSize` mean?
```
static_assert(BI_LAST < (1U << BitCodeConstants::SubblockIDSize), "Too many block id's!");
```
?


================
Comment at: clang-doc/BitcodeWriter.h:94
+  FUNCTION_LOCATION,
+  FUNCTION_MANGLED_NAME,
+  FUNCTION_PARENT,
----------------
So i have a question: if something (`FUNCTION_MANGLED_NAME` in this case) is phased out, does it have to stay in this enum?
That will introduce holes in `RecordIdNameMap`.
Are the actual numerical id's of enumerators stored in the bitcode, or the string (abbrev, `RecordIdNameMap[].Name`)?

Looking at tests, i guess these enums are internal detail, and they can be changed freely, including removing enumerators.
Am i wrong?

I think that should be explained in a comment before this `enum`.


================
Comment at: clang-doc/BitcodeWriter.h:100
+};
+
+#undef INFORECORDS
----------------
**If** `AbbreviationMap` comment makes sense, i guess that common code should be moved here, i.e.
```
static constexpr unsigned RecordIdCount = RI_LAST - RI_FIRST + 1;
```
and use this new variable in those two places.


================
Comment at: clang-doc/BitcodeWriter.h:163
+   public:
+    AbbreviationMap() {}
+    void add(RecordId RID, unsigned AbbrevID);
----------------
We know we will have at most `RI_LAST - RI_FIRST + 1` abbreviations.
Right now that results in just ~40 abbreviations.
Would it make sense to
```
AbbreviationMap() : Abbrevs(RI_LAST - RI_FIRST + 1) {}
```
?
(or `llvm::DenseMap<unsigned, unsigned> Abbrevs = llvm::DenseMap<unsigned, unsigned>(RI_LAST - RI_FIRST + 1);` but that looks uglier to me..)



https://reviews.llvm.org/D41102





More information about the cfe-commits mailing list