[PATCH] D41102: Setup clang-doc frontend framework

Roman Lebedev via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Feb 28 11:31:57 PST 2018


lebedev.ri added a comment.

Thank you for working on this!
Some more review notes.
Please look into adding a bit more tests.



================
Comment at: clang-doc/BitcodeWriter.cpp:179
+      assert(Inits.size() == RecordIdCount);
+      for (const auto &Init : Inits) RecordIdNameMap[Init.first] = Init.second;
+      assert(RecordIdNameMap.size() == RecordIdCount);
----------------
Since this is the only string we ever push to `Record`, can we add an assertion to make sure we always have enough room for it?
E.g.
```
for (const auto &Init : Inits) {
  RecordId RID = Init.first;
  RecordIdNameMap[RID] = Init.second;
  assert((1 + RecordIdNameMap[RID].size()) <= Record.size());
  // Since record was just created, it should not have any dynamic size.
  // Or move the small size into a variable and use it when declaring the Record and here.
}
```


================
Comment at: clang-doc/BitcodeWriter.cpp:230
+  prepRecordData(ID);
+  for (const char C : RecordIdNameMap[ID].Name) Record.push_back(C);
+  Stream.EmitRecord(llvm::bitc::BLOCKINFO_CODE_SETRECORDNAME, Record);
----------------
Sadly, i can **not** prove it via godbolt (can't add LLVM as library), but i'd //expect// streamlining this should at least not hurt, i.e. something like
```
Record.append(RecordIdNameMap[ID].Name.begin(), RecordIdNameMap[ID].Name.end());
```
?


================
Comment at: clang-doc/BitcodeWriter.cpp:196
+/// \brief Emits a record name to the BLOCKINFO block.
+void ClangDocBitcodeWriter::emitRecordID(RecordId ID) {
+  assert(RecordIdNameMap[ID] && "Unknown Abbreviation");
----------------
juliehockett wrote:
> lebedev.ri wrote:
> > Hmm, so i've been staring at this and http://llvm.org/doxygen/classllvm_1_1BitstreamWriter.html and i must say i'm not fond of this indirection.
> > 
> > What i don't understand is, in previous function, we don't store `BlockId`, why do we want to store `RecordId`?
> > Aren't they both unstable, and are implementation detail?
> > Do we want to store it (`RecordId`)? If yes, please explain it as a new comment in code.
> > 
> > If no, i guess this would work too?
> > ```
> > assert(RecordIdNameMap[ID] && "Unknown Abbreviation");
> > Record.clear();
> > Stream.EmitRecord(llvm::bitc::BLOCKINFO_CODE_SETRECORDNAME, RecordIdNameMap[ID].Name);
> > ```
> > And after that you can lower the default size of `SmallVector<> Record` down to, hm, `4`?
> I'm not entirely certain what you mean -- in `emitBlockId()`, we are storing both the block id and the block name in separate records (`BLOCKINFO_CODE_SETBID`, `BLOCKINFO_CODE_BLOCKNAME`, respectively). In `emitRecordId()`, we're doing something slightly different, in that we emit one record with both the record id and the record name (in record `BLOCKINFO_CODE_SETRECORDNAME`). 
> 
> Replacing the copy loop here has the same issue as above, namely that there isn't an easy way to convert between a `StringRef` and an array of `unsigned char`.
Tried locally, and yes, we do need to output record id.

What we could **actually** do, is simply inline that `EmitRecord()`, first emitting the RID, and then the name.
```
template <typename Container>
void EmitRecord(unsigned Code, int ID, const Container &Vals) {
  // If we don't have an abbrev to use, emit this in its fully unabbreviated
  // form.
  auto Count = static_cast<uint32_t>(makeArrayRef(Vals).size());
  EmitCode(bitc::UNABBREV_RECORD);
  EmitVBR(Code, 6);
  EmitVBR(Count + 1, 6); // Including ID
  EmitVBR64(ID, 6); // 'Prefix' with ID
  for (unsigned i = 0, e = Count; i != e; ++i)
    EmitVBR64(Vals[i], 6);
}
```

But that will result in rather ugly code.
So given that the record names are quite short, and all the other strings we output directly, maybe leave it as it is for now, until it shows in profiles?



================
Comment at: clang-doc/BitcodeWriter.h:37
+  static constexpr unsigned SubblockIDSize = 4U;
+  static constexpr unsigned BoolSize = 1U;
+  static constexpr unsigned IntSize = 16U;
----------------
juliehockett wrote:
> lebedev.ri wrote:
> > Hmm, you build with asserts enabled, right?
> > I tried testing this, and three tests fail with
> > ```
> > clang-doc: /build/llvm/include/llvm/Bitcode/BitstreamWriter.h:122: void llvm::BitstreamWriter::Emit(uint32_t, unsigned int): Assertion `(Val & ~(~0U >> (32-NumBits))) == 0 && "High bits set!"' failed.
> > ```
> > ```
> > Failing Tests (3):
> >     Clang Tools :: clang-doc/mapper-class-in-function.cpp
> >     Clang Tools :: clang-doc/mapper-function.cpp
> >     Clang Tools :: clang-doc/mapper-method.cpp
> > 
> >   Expected Passes    : 6
> >   Unexpected Failures: 3
> > ```
> > At least one failure is because of `BoolSize`, so i'd suspect the assertion itself is wrong...
> I do, and I've definitely seen that one triggered before but it's been because something was off in how the data was being outputted as I was shifting things around. That said, I'm not seeing it in my local build with this diff though -- I'll update it again just to make sure they're in sync.
I did not retry with updated tree/patch, but i'm quite sure i did hit those asserts.
My current build line:
```
-DCMAKE_BUILD_TYPE:STRING=RelWithDebInfo
-DLLVM_BINUTILS_INCDIR:PATH=/usr/include
-DLLVM_BUILD_TESTS:BOOL=ON
-DLLVM_ENABLE_ASSERTIONS:BOOL=ON
-DLLVM_ENABLE_LLD:BOOL=ON
-DLLVM_ENABLE_PROJECTS:STRING=clang;libcxx;libcxxabi;compiler-rt;lld
-DLLVM_ENABLE_SPHINX:BOOL=ON
-DLLVM_ENABLE_WERROR:BOOL=ON
-DLLVM_PARALLEL_LINK_JOBS:STRING=1
-DLLVM_TARGETS_TO_BUILD:STRING=X86
-DLLVM_USE_SANITIZER:STRING=Address 
```
Additional env variables:
```
export MALLOC_CHECK_=3
export MALLOC_PERTURB_=$(($RANDOM % 255 + 1))
export ASAN_OPTIONS=abort_on_error=1
export UBSAN_OPTIONS=print_stacktrace=1
```


================
Comment at: clang-doc/BitcodeWriter.h:226
+
+  SmallVector<uint32_t, 16> Record;
+  llvm::BitstreamWriter &Stream;
----------------
Needs a comment about the choice of static size of Record.
I.e. the maximal amount of stuff we expect to push there is recordname string (right now `IsDefinition` is the longest at `13` chars) + 1 integer.

And add a newline
```

// Notes
SmallVector<uint32_t, 16> Record;

llvm::BitstreamWriter &Stream;
...
```


================
Comment at: clang-doc/Mapper.cpp:28
+  llvm::SmallString<128> USR;
+  if (index::generateUSRForDecl(D, USR)) return false;
+
----------------
+// If we should ignore this declaration, exit this decl
?


================
Comment at: clang-doc/Mapper.h:30
+
+class ClangDocMapper : public clang::RecursiveASTVisitor<ClangDocMapper> {
+ public:
----------------
I wonder if we could reflect the usage of `RecursiveASTVisitor` in the class name.
Though `ClangDocMapperASTVisitor` sounds too long?


================
Comment at: clang-doc/Representation.h:27
+
+struct Info;
+enum class InfoType {
----------------
Is there an intentional decision to minimize `sizeof()` of these structs?
Many(?) of those could be `SmallString`'s


================
Comment at: test/CMakeLists.txt:44
   clangd
+  clang-doc
   clang-include-fixer
----------------
There is are no tests with `CommentBlock` blocks.


================
Comment at: test/clang-doc/mapper-class-in-class.cpp:6
+// RUN: clang-doc --dump --omit-filenames -doxygen -p %t %t/test.cpp -output=%t/docs
+// RUN: llvm-bcanalyzer %t/docs/c:@S at X@S at Y.bc --dump | FileCheck %s
+
----------------
Ok, so this actually produced `c:@S at X.bc` and `c:@S at X@S at Y.bc`.
Please do something like:
```
// RUN: llvm-bcanalyzer %t/docs/c:@S at X.bc --dump | FileCheck %s --check-prefix CHECK-X
// RUN: llvm-bcanalyzer %t/docs/c:@S at X@S at Y.bc --dump | FileCheck %s --check-prefix CHECK-X-Y

// CHECK-X: <BLOCKINFO_BLOCK/>
// CHECK-X: <VersionBlock NumWords=1 BlockCodeSize=4>
  // CHECK-X: <Version abbrevid=4 op0=1/>
// CHECK-X: </VersionBlock>
// CHECK-X: <RecordBlock NumWords=6 BlockCodeSize=4>
  // CHECK-X: <USR abbrevid=4 op0=6/> blob data = 'c:@S at X'
  // CHECK-X: <Name abbrevid=5 op0=1/> blob data = 'X'
  // CHECK-X: <IsDefinition abbrevid=7 op0=1/>
  // CHECK-X: <TagType abbrevid=10 op0=3/>
// CHECK-X: </RecordBlock>

// CHECK-X-Y: <BLOCKINFO_BLOCK/>
// CHECK-X-Y: <VersionBlock NumWords=1 BlockCodeSize=4>
  // CHECK-X-Y: <Version abbrevid=4 op0=1/>
// CHECK-X-Y: </VersionBlock>
// CHECK-X-Y: <RecordBlock NumWords=11 BlockCodeSize=4>
  // CHECK-X-Y: <USR abbrevid=4 op0=10/> blob data = 'c:@S at X@S at Y'
  // CHECK-X-Y: <Name abbrevid=5 op0=1/> blob data = 'Y'
  // CHECK-X-Y: <Namespace abbrevid=6 op0=1 op1=6/> blob data = 'c:@S at X'
  // CHECK-X-Y: <IsDefinition abbrevid=7 op0=1/>
  // CHECK-X-Y: <TagType abbrevid=10 op0=3/>
// CHECK-X-Y: </RecordBlock>
```

On a related note, is there any way to auto-generate these `CHECK` lines?
There is this `llvm/utils/update_test_checks.py`, but i doubt it will work here.


================
Comment at: test/clang-doc/mapper-class-in-function.cpp:8
+
+void H() {
+	class I {};
----------------
Here too, i suppose


================
Comment at: test/clang-doc/mapper-enum.cpp:7-8
+// RUN: llvm-bcanalyzer %t/docs/c:@E at B.bc --dump | FileCheck %s
+
+enum B { X, Y };
+// CHECK: <BLOCKINFO_BLOCK/>
----------------
Could you please also add a similar `enum class` test?


================
Comment at: test/clang-doc/mapper-enum.cpp:17
+  // CHECK: <IsDefinition abbrevid=7 op0=1/>
+  // CHECK: <TypeBlock NumWords=4 BlockCodeSize=4>
+    // CHECK: <Type abbrevid=4 op0=3 op1=1/> blob data = 'X'
----------------
Can `TypeBlock` be on the same depth as `VersionBlock`?
Via `using`/`typename`? If yes, please add such a test.


================
Comment at: test/clang-doc/mapper-method.cpp:8
+
+class G {
+public: 
----------------
And here


https://reviews.llvm.org/D41102





More information about the cfe-commits mailing list