[PATCH] D41102: Setup clang-doc frontend framework

Fri Feb 23 09:48:07 PST 2018

Athosvk added a comment.

The change to USR seems like quite an improvement already! That being said, I do think that it might be preferable to opt out of the use of strings for linking things together. What we did with our clang-doc is that we directly used pointers to refer to other types. So for example, our class for storing Record/CXX related information has something like:

  std::vector<Function*>	mMethods;
  std::vector<Variable*>	mVariables;
  std::vector<Enum*>	mEnums;
  std::vector<Typedef*>	mTypedefs;

Only upon serialization we fetch some kind of USR that would uniquely identify the type. This is especially useful to us for the conversion to HTML and I think the same would go for this backend, as it seems this way you'll have to do string lookups to get to the actual types, which would be inefficient in multiple aspects. It can make the backend a little more of a one-on-one conversion, e.g. with one of our HTML template definitions (note: this is a Jinja2 template in Python):

  {%- for enum in inEntry.GetMemberEnums() -%}
  	<tr class="separator">
  		<td class="memSeparator" colspan="3"></td>
  	</tr>
  	<tr class="memitem:EAllocatorStrategy">
  		<td class="memItemLeft" align="right">{{- Modifiers.RenderAccessModifier(enum.GetAccessModifier()) -}}</td>
  		<td class="memItemMiddle" align="left">enum <a href="{{ enum.GetID() }}.html">{{- enum.GetName().GetName()|e -}}</a></td>
  		<td class="memItemRight" valign="bottom">{{- Descriptions.RenderDescription(enum.GetBriefDescription()) -}}</td>
  	</tr>
  {%- endfor -%}

Disadvantage is of course that you add complexity to certain parts of the deserialization (/serialization) for nested types and inheritance, by either having to do so in the correct order or having to defer the process of initializing these pointers. But see this as just as some thought sharing. I do think this would improve the interaction in the backend (assuming you use the same representation as currently in the frontend). Also, we didn't apply this to our Type representation (which we use to store the type of a member, parameter etc.), which stores the name of the type rather than a pointer to it (since it can also be a built-in), though it embeds pretty much every possible modifier on said type, like this:

  EntryName			mName;									
  bool				mIsConst = false;						
  EReferenceType			mReferenceType = EReferenceType::None;	
  std::vector<bool>		mPointerConstnessMask;					
  std::vector<std::string>	mArraySizes;							
  bool				mIsAtomic = false;						
  std::vector<Attribute>		mAttributes;							
  bool				mIsExpansion = false;					
  std::vector<TemplateArgument>	mTemplateArguments;						
  std::unique_ptr<FunctionTypeProperties>     mFunctionTypeProperties = nullptr;		
  EntryName			mParentCXXEntry;

The last member refers to the case where a pointer is a pointer to member, though some other fields may require some explaining too. Anyway, this is just to give some insight into how we structured our representation, where we largely omitted string representations where possible.

Have you actually started work already on some backend? Developing backend and frontend in tandem can provide some additional insights as to how things should be structured, especially representation-wise!

================
Comment at: clang-doc/Representation.h:113
+  TagTypeKind TagType;
+  llvm::SmallVector<std::unique_ptr<MemberTypeInfo>, 4> Members;
+  llvm::SmallVector<std::string, 4> ParentUSRs;
----------------
How come these are actually unique ptrs? They can be stored directly in the vector, right? (same for CommentInfo children, FnctionInfo params etc.)

https://reviews.llvm.org/D41102