[PATCH] D41102: Setup clang-doc frontend framework

Wed Mar 7 13:39:05 PST 2018

lebedev.ri added a comment.

Hmm, i'm missing something about the way store sha1...

================
Comment at: clang-doc/BitcodeWriter.cpp:53
+            {// 0. Fixed-size integer (length of the sha1'd USR)
+             llvm::BitCodeAbbrevOp(llvm::BitCodeAbbrevOp::VBR,
+                                   BitCodeConstants::USRLengthSize),
----------------
This is VBR because USRLengthSize is of such strange size, to conserve the bits?

================
Comment at: clang-doc/BitcodeWriter.cpp:57
+             llvm::BitCodeAbbrevOp(llvm::BitCodeAbbrevOp::Array),
+             llvm::BitCodeAbbrevOp(llvm::BitCodeAbbrevOp::Char6)});
+}
----------------
Looking at the `NumWords` changes (decrease!) in the tests, and this is bugging me.
And now that i have realized what we do with USR:
* we first compute SHA1, and get 20x uint8_t
* store/use it internally
* then hex-ify it, getting 40x char (assuming 8-bit char)
* then convert to char6, winning back two bits. but we still loose 2 bits.

Question: *why* do we store sha1 of USR as a string? 
Why can't we just store that USRString (aka USRSha1 binary) directly?
That would be just 20 bytes, you just couldn't go any lower than that.

================
Comment at: clang-doc/Representation.h:29
+
+using USRString = std::array<uint8_t, 20>;
+
----------------
Right, of course, internally this is kept in the binary format, which is just 20 chars.
This is not the string (the hex-ified version of sha1), but the raw sha1, the binary.
This should somehow convey that. This should be something closer to `USRSha1`.

https://reviews.llvm.org/D41102