[cfe-dev] FW: get the text in aC file between 2 SourceLocations

Wed Sep 10 08:05:51 PDT 2008

On Sep 9, 2008, at 9:48 PM, b.j.burgers at student.utwente.nl wrote:

> Thanks for your quick response Snaroff.
> I choose to change the function Lexer::LexNumericConstant and  
> generate my own kind of token if I find an s at the end of an  
> numerical_constant.
> Now I want to separate the number from the s.
>
> Is there an easy way of getting the text of a token in the C file?

Not directly. You need to go through the SourceManager as follows...

const char *sourceText = SM->getCharacterData(Tok.getLocation());

If you want the token type name, this will suffice...

const char *tokenName = Tok.getName();
>

> I searched for this a while ago too but couldn’t find it. I found  
> the SourceManager.GetCharacterData(Sourcelocation ) function, but  
> this returns me all characters starting from the Sourceloaction.
> Is there a way to get a char * of text between 2 SourceLocations ?
>

These should do the trick...

   // converts SourceLocation's into "char *'s"
   const char *startBuf = SM->getCharacterData(LocStart);
   const char *endBuf = SM->getCharacterData(LocEnd);

   // converts a "char *" offset into a SourceLocation
   SourceLocation OptionalLoc = LocStart.getFileLocWithOffset(p- 
startBuf);

snaroff

> Thanks for all the help,
>
> Bas
>
> Van: steve naroff [mailto:snaroff at apple.com]
> Verzonden: dinsdag 9 september 2008 15:02
> Aan: Burgers, B.J. (Bas, Student EMSYS)
> CC: cfe-dev at cs.uiuc.edu
> Onderwerp: Re: [cfe-dev] changing the lexer or parser
>
> Hi Bas,
>
> clang currently implements C integer constants by including the  
> trailing suffix (see C99 6.4.4.1 for more details).
>
> Sema::ActOnNumbericConstant() is then responsible for determining  
> the type of constant (integer, floating) and size.
>
> I haven't thought about adapting clang's lexer to generate tokens  
> that don't conform to C.
>
> That said, you could simply examine the "suffix" by hand (without  
> fiddling with the lexer directly).
>
> snaroff
>
> On Sep 9, 2008, at 5:27 PM, b.j.burgers at student.utwente.nl wrote:
>
>
> Hello,
>
> I’m working on a tool that allows time construct in C. I implemented  
> this tool by adapting Clang.
> In these time constructs I like to allow arguments like “1000s”,  
> “1000 s”, “1000  s”, “100ms”, “100 ms”, etc.
> The lexer creates 1 token called numerical_token if the argument is  
> “1000s” even if ‘s’ is added as keyword or token in TokenKinds.def.  
> I hoped the lexer would have generated two tokens, 1  
> numerical_constant and an identifier (or self defined token).
> What is the best way to allow these kind of arguments ? Do I have to  
> create a new token that allows some digits followed by an ‘s’?
>
> Thanks for any help,
>
> Bas Burgers
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20080910/e69bd572/attachment.html>