[cfe-commits] r59169 - in /cfe/trunk: include/clang/Lex/PTHLexer.h lib/Lex/PTHLexer.cpp

Sun Nov 16 12:31:53 PST 2008

On Nov 12, 2008, at 1:37 PM, Ted Kremenek wrote:
> URL: http://llvm.org/viewvc/llvm-project?rev=59169&view=rev
> Log:
> Add skeleton for PTH lexer.

Ok.

> +#ifndef LLVM_CLANG_PTHLexer_H
> +#define LLVM_CLANG_PTHLexer_H

Generally file guards are all caps.

>
> +
> +#include "clang/Lex/PreprocessorLexer.h"
> +
> +namespace clang {
> +
> +class PTHLexer : public PreprocessorLexer {

This should have a doxygen comment.

> +
> +  /// Tokens - This is the pointer to an array of tokens that the  
> macro is
> +  /// defined to, with arguments expanded for function-like  
> macros.  If this is
> +  /// a token stream, these are the tokens we are returning.
> +  const Token *Tokens;

This comment was copied from somewhere else, please update it.

+  /// NumTokens - This is the length of the Tokens array.
+  ///
+  unsigned NumTokens;

This also needs to be corrected.  The array is NumTokens+1 long.

> +  /// CurToken - This is the next token that Lex will return.

This is the *index* of the ...?

> ++public:
> +
> +  /// Create a PTHLexer for the specified token stream.
> +  PTHLexer(Preprocessor& pp, SourceLocation fileloc,
> +           const Token *TokArray, unsigned NumToks);
> +  ~PTHLexer() {}
> +
> +  /// Lex - Return the next token.
> +  void Lex(Token &Tok);
> +
> +  void setEOF(Token &Tok);

Needs a doxygen comment, what is this used for?  What does it do?   
Looking at the implementation, should it be const?  Why not make it  
inline?

> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> ======================================================================
> --- cfe/trunk/lib/Lex/PTHLexer.cpp (added)
> +++ cfe/trunk/lib/Lex/PTHLexer.cpp Wed Nov 12 15:37:15 2008
>
> +PTHLexer::PTHLexer(Preprocessor& pp, SourceLocation fileloc,
> +                   const Token *TokArray, unsigned NumToks)
> +  : PP(pp), FileLoc(fileloc), Tokens(TokArray), NumTokens(NumToks),  
> CurToken(0){
> +
> +  assert (Tokens[NumTokens-1].is(tok::eof));

No space after function/macro invocations, please add an assert  
message: && "foo is wrong".  You should mention this required  
invariant on the doxygen comment for the ctor in the header.

>
> +  --NumTokens;

Why not just initialize NumTokens to (NumToks-1) to simplify this and  
the assert?
>
> +
> +  LexingRawMode = false;
> +  ParsingPreprocessorDirective = false;
> +  ParsingFilename = false;

Should this go in the parent class ctor?

> +void PTHLexer::Lex(Token& Tok) {
> +
> +  if (CurToken == NumTokens) {

Ok

>
> +    // If we hit the end of the file while parsing a preprocessor  
> directive,
> +    // end the preprocessor directive first.  The next token  
> returned will
> +    // then be the end of file.
> +    //   OR
> +    // If we are in raw mode, return this event as an EOF token.   
> Let the caller
> +    // that put us in raw mode handle the event.
> +    if (ParsingPreprocessorDirective || LexingRawMode) {
> +      // Done parsing the "line".
> +      ParsingPreprocessorDirective = false;
> +      Tok = Tokens[CurToken]; // not an out-of-bound access
> +      // FIXME: eom handling?

I don't think these cases should be conflated like this.  If you're in  
raw mode, the EOF token should be returned.  If you're in  
ParsingPreprocessorDirective mode (which is exclusive of raw mode),  
you should return EOM.

This brings up a meta question: why does the token array have an EOF  
token at the end.  This decision seems to cause a lot of weirdness in  
this code (including the confusion about the length of the array) and  
doesn't seem to add value.  Having to say "not an out-of-bound access"  
is an indicator that there is something wrong.

> +  Tok = Tokens[CurToken];
> +
> +  if (ParsingPreprocessorDirective && Tok.isAtStartOfLine()) {
> +    ParsingPreprocessorDirective = false; // Done parsing the "line".
> +    MIOpt.ReadToken();
> +    // FIXME:  Need to replicate:
> +    // FormTokenWithChars(Tok, CurPtr, tok::eom);

I don't think there is really anything interesting to replicate.  Just  
set the kind to tok::eom, give it length 0, and a 0 offset.  You  
should do the same thing to form an EOF token instead of putting it  
into the array.

>
> +    Tok.setKind(tok::eom);
> +    return;
> +  }
> +  else // Otherwise, advance to the next token.
> +    ++CurToken;

No need for the else.

>
> +
> +  if (Tok.isAtStartOfLine() && Tok.is(tok::hash) && !LexingRawMode) {
> +    PP.HandleDirective(Tok);
> +    PP.Lex(Tok);
> +    return;
> +  }

I think you also need to null out the identifier info pointer in the  
token if in raw mode, in order to disable macro expansion in some cases.

-Chris