[cfe-dev] tabs in the input and their effect on the column position

Dimitri van Heesch dimitri at stack.nl
Tue Mar 12 13:57:18 PDT 2013


Hi Jordan,

Thanks for the info.

It would have been nice if clang_tokenize would have included whitespace as tokens (if only as 
an option). Then one could reconstruct the output by purely looking at the tokens. Without it one 
needs to extract the whitespace between the tokens in the original file and analyse it to 
see how it should be rendered in the output. Doesn't sound very efficient nor convenient.

Meanwhile I found out how to use CXUnsavedFile to pass a de-tabbed representation of the 
original file, which works for me. Note that in my case both input and output are UTF-8 
encoded, so the multi-byte characters are just passed through fine.

Regards,
  Dimitri

On Mar 10, 2013, at 8:42 , Jordan Rose <jordan_rose at apple.com> wrote:

> I want to jump in before anyone starts suggesting solutions, and point out that all libclang output is in terms of bytes from the start of a line. That means that tabs show up as one byte, but it also means that if the source contains multibyte characters, you may have a range of three bytes referring to a single character (and a single column). 
> 
> Clang and libclang expect you to interpret their output appropriately for your use case. Perhaps you process tabs to mean "new table cell".
> 
> All that said, making the byte/column map machinery in TextDiagnostic more reusable would probably be a good thing all around. LLVM's diagnostics don't handle multibyte characters at all.
> 
> Jordan
> 
> 
> On Mar 9, 2013, at 13:06 , Dimitri van Heesch <dimitri at stack.nl> wrote:
> 
>> Hi All,
>> 
>> I'm currently experimenting with improving doxygen's parsing capabilities by using the information from clang. 
>> I'm using the libclang functions clang_tokenize and clang_annotateTokens to create hyperlinked and syntax highlighted 
>> output for the source files processed by doxygen.
>> 
>> So far it works quite well, but when the source file contains a tab character this seems to be counted as one character, causing
>> the output to be misaligned.
>> 
>> Is there some way to configure the number of spaces in a tab? or is there a way to replace tabs by spaces before sending the
>> contents of a file to libclang, without first having to write the detabbed file to disk?
>> 
>> Any help is appreciated.
>> 
>> Regards,
>>  Dimitri
>> 
>> 
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
> 





More information about the cfe-dev mailing list