[PATCH] D136894: Add clang-doc readme

Mon Oct 31 16:23:02 PDT 2022

paulkirth requested changes to this revision.
paulkirth added a comment.
This revision now requires changes to proceed.

First, I don't think this is a good fit for a `README.txt` based on the content of other clang-tool `README.txt` files. Most of them only outline project scope or point to formal documentation on llvm.org. Other README files also tend to only provide build or dependency information. The one example I can find that does more is `clangd/quality/README.md`, which provides details on a number of internal data structures. Its preferable to document project internals in the `.rst` files as part of the developer facing documentation on llvm.org, so I think that's where something like this would ultimately belong.

I think there are some other issues with the content of this patch.

- I think there is some miscommunication about clang-doc's design. Clang-doc follows LLVM's architecture to a large extent by translating the AST into its own intermediate representation, and then letting custom backends generate the appropriate output. Further, this design allows the clang-doc IR to be serialized(though I understand this is not yet implemented), which is important for testing and scalability. The initial clang-doc RFC and discussion with other clang-tool developers pointed to this design as the best practice for scaling up to large organizations. I understand the tool is not meeting expectations in that regard, but it doesn't mean the design is the incorrect choice. While you're certainly free to disagree with that approach, this document isn't the place to codify that opinion. You're also free to submit patches to refactor clang-doc in the way you think it should be architected. Contributions, as always are welcome. 😄
- Call outs to "watch out" don't seem appropriate for formal documentation. A "note" or some other descriptor will work just as well without using language that makes the tool sound dangerous.
- The introduction is a coarse approximation of how libTooling operates, you can just say that it's build on libTooling and point to its documentation. I think that will 1) be more concise, and 2) allow you to focus on the details of clang-doc without worrying about describing the infrastructure its built upon. If you still feel that its central to this document, you may wish to quote it directly.
- Describing how to add a new field is mostly fine. The code should probably cary a reference to this documentation (or should generate the documentation).

My recommendation here is that you should move this into an `rst` file under `doc` and revise this document to more closely model the existing documentation practices in LLVM. I've left some suggestions inline that you may find useful. How you revise the document is up to you, and I won't' be a stickler if you would like to word something differently than I've suggested.

================
Comment at: clang-tools-extra/clang-doc/README.txt:5-10
+Clang-doc uses the "tooling" library which can run the compiler. It can take the files directly or
+it can extract the file list and the commands corresponding to each from a compile_commands.json
+file.
+
+The tooling library spins up threads and parses the compilation units in parallel. Clang-doc
+registers a callback to run on the AST of each unit.
----------------
the libtooling documentation covers this, can we just link to that?

================
Comment at: clang-tools-extra/clang-doc/README.txt:12-16
+When the AST is known, the MapASTVisitor in Mapper.cpp is run on the AST. It has callbacks for the
+main AST nodes that clang-doc cares about. This is a very simple object that mostly calls into
+Serialize.cpp to generate the "representation" of the code. These are the various *Info structures
+like FunctionInfo and NamespaceInfo defined in Representation.h that correspond to each element of
+the code that might be documented.
----------------

================
Comment at: clang-tools-extra/clang-doc/README.txt:18-20
+The representation from each execution thread is serialized to bitcode using BitcodeWriter.cpp. This
+is a custom bitcode defined in BitcodeWriter.h and is NOT regular LLVM bitcode IR. Watch out: the
+"Serialize.cpp" file is not related to this step though the name may imply it.
----------------

================
Comment at: clang-tools-extra/clang-doc/README.txt:22-27
+These bitcode representations are then passed back to the main thread and deserialized back to a
+new copy of the representation in BitcodeReader.cpp. This round-trip through bitcode is used only to
+get the data out of the AST visitor (which is expected to return a byte stream) and is never saved
+or used for any other purpose. This round-trip adds significant complexity and we should consider
+passing the Representation object hierarchy back to the main thread out-of-band without
+serialization and deleting the bitcode representation.
----------------
I'd drop this paragraph altogether.

================
Comment at: clang-tools-extra/clang-doc/README.txt:30-31
+After deserialization, the various representation objects from each thread are merged/reduced into a
+single structure. This is necessary both to collect everything and to merge the declarations of
+things (of which there may be many) with the actual implementation. Many fields on a record can't be
+merged in the abstract (for example, a boolean field on two structures with two different values has
----------------
Maybe something along these lines?

================
Comment at: clang-tools-extra/clang-doc/README.txt:37-38
+
+After merging the final representation is passed to the output generator which writes the final
+files.
+
----------------
I'd add something about the various backends, or at least mention that the output is determined by a backend implementation.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136894/new/

https://reviews.llvm.org/D136894