[cfe-dev] Debugging Information; exploiting comments

Fri Nov 9 17:10:28 PST 2007

On Nov 9, 2007, at 4:57 PM, Ted Kremenek wrote:

>> I am investigating adding debugging information to clang. Do you
>> think it to
>> soon? (I would like to add a -g flag to the drivers and add
>> conditional llvm
>> debug intrinsic emission in CodeGen).
>> Do you have already some idea on the question? (I studied llvm-gcc
>> debug
>> generation and would add something similar)

I'm not currently involved with the efforts on the CodeGen module.   
Chris, Devang?

>> Another idea I had is implementing a documentation tools (like
>> doxygen)
>> using clang. The problem is that the existing framework doesn't  
>> permit
>> analyzing the commentary and parsing in the same pass (At least I
>> don't see
>> how).

You are indeed correct; currently the parser discard comments when the  
ASTs are built.  We have discussed the technical hurdles of doing  
"appropriate" handling of comments, as they could be used for doxygen- 
like tools, as well as for annotations that can be used by other  
analysis tools.

The main challenge is that comments can appear literally anywhere, and  
how they conceptually bind to entities in the program (be they  
declarations or actual statements and expressions) is really specific  
to the application that uses the comments (e.g. doxygen).

>> I would need to add some sort of callback in the lexer or
>> preprocesseur for processing the comment (we can't parse the comment
>> token,
>> it would be an impossible task). The callback would store the
>> comment and
>> when the next declaration would be parsed, the stored comment is
>> used for
>> decoration the declaration. Do you think this is a good way?

I'm not entirely certain how comments are processed by the lexer and  
parser, and how easy it would be to add a callback.  I believe that it  
is doable, but I haven't really looked at that code.  Steve, Chris?

Conceptually, if a callback mechanism for parsing comments is in place  
you could then do whatever you wanted with the comments, although it  
wouldn't necessarily be easy (it would depend on your application).   
The ideal solution would be to separate the policy of how comments are  
used (e.g. how you bind them to expressions, statements, declarations,  
and so forth) with how they are parsed (or rather, how the ASTs are  
built in Sema).  That way a bunch of tools that process comments could  
be built instead of a single ad hoc solution.  We also don't want to  
get into the business of people unnecessarily hacking on the Sema  
module where the ASTs are built and semantically analyzed.  Such hacks  
would inevitably cause tools built on such hacks to diverge from the  
functionality available in "mainline" clang.

>> Another
>> way
>> would be to add some sort of filter between the lexer and the parser
>> which
>> would process and delete the comment token as they come, but it would
>> probably be slower and on the critical path (not sure the lexing/
>> parsing
>> part is time critical since the semantical analysis will eventually
>> probably
>> be a lot slower).

I'm not certain if I completely understand this solution.  At the end  
of the day you still need to bind comments (or whatever data you  
extract from them) to ASTs (decls, etc.).  Since the parser/lexer has  
no notion of ASTs, you almost necessarily have to put some of the key  
logic at a higher level (e.g., the Sema module).  IIRC, essentially  
the parser and lexer just build tokens and process the C grammar; Sema  
actually builds the ASTs based on an interface between it an the parser.