[cfe-dev] FW: get the text in aC file between 2 SourceLocations

steve naroff snaroff at apple.com
Wed Sep 10 08:05:51 PDT 2008

On Sep 9, 2008, at 9:48 PM, b.j.burgers at student.utwente.nl wrote:

> Thanks for your quick response Snaroff.
> I choose to change the function Lexer::LexNumericConstant and  
> generate my own kind of token if I find an s at the end of an  
> numerical_constant.
> Now I want to separate the number from the s.
> Is there an easy way of getting the text of a token in the C file?

Not directly. You need to go through the SourceManager as follows...

const char *sourceText = SM->getCharacterData(Tok.getLocation());

If you want the token type name, this will suffice...

const char *tokenName = Tok.getName();

> I searched for this a while ago too but couldn’t find it. I found  
> the SourceManager.GetCharacterData(Sourcelocation ) function, but  
> this returns me all characters starting from the Sourceloaction.
> Is there a way to get a char * of text between 2 SourceLocations ?

These should do the trick...

   // converts SourceLocation's into "char *'s"
   const char *startBuf = SM->getCharacterData(LocStart);
   const char *endBuf = SM->getCharacterData(LocEnd);

   // converts a "char *" offset into a SourceLocation
   SourceLocation OptionalLoc = LocStart.getFileLocWithOffset(p- 


> Thanks for all the help,
> Bas
> Van: steve naroff [mailto:snaroff at apple.com]
> Verzonden: dinsdag 9 september 2008 15:02
> Aan: Burgers, B.J. (Bas, Student EMSYS)
> CC: cfe-dev at cs.uiuc.edu
> Onderwerp: Re: [cfe-dev] changing the lexer or parser
> Hi Bas,
> clang currently implements C integer constants by including the  
> trailing suffix (see C99 for more details).
> Sema::ActOnNumbericConstant() is then responsible for determining  
> the type of constant (integer, floating) and size.
> I haven't thought about adapting clang's lexer to generate tokens  
> that don't conform to C.
> That said, you could simply examine the "suffix" by hand (without  
> fiddling with the lexer directly).
> snaroff
> On Sep 9, 2008, at 5:27 PM, b.j.burgers at student.utwente.nl wrote:
> Hello,
> I’m working on a tool that allows time construct in C. I implemented  
> this tool by adapting Clang.
> In these time constructs I like to allow arguments like “1000s”,  
> “1000 s”, “1000  s”, “100ms”, “100 ms”, etc.
> The lexer creates 1 token called numerical_token if the argument is  
> “1000s” even if ‘s’ is added as keyword or token in TokenKinds.def.  
> I hoped the lexer would have generated two tokens, 1  
> numerical_constant and an identifier (or self defined token).
> What is the best way to allow these kind of arguments ? Do I have to  
> create a new token that allows some digits followed by an ‘s’?
> Thanks for any help,
> Bas Burgers
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20080910/e69bd572/attachment.html>

More information about the cfe-dev mailing list