[cfe-dev] tabs in the input and their effect on the column position

Sat Mar 9 23:42:33 PST 2013

I want to jump in before anyone starts suggesting solutions, and point out that all libclang output is in terms of bytes from the start of a line. That means that tabs show up as one byte, but it also means that if the source contains multibyte characters, you may have a range of three bytes referring to a single character (and a single column). 

Clang and libclang expect you to interpret their output appropriately for your use case. Perhaps you process tabs to mean "new table cell".

All that said, making the byte/column map machinery in TextDiagnostic more reusable would probably be a good thing all around. LLVM's diagnostics don't handle multibyte characters at all.

Jordan

On Mar 9, 2013, at 13:06 , Dimitri van Heesch <dimitri at stack.nl> wrote:

> Hi All,
> 
> I'm currently experimenting with improving doxygen's parsing capabilities by using the information from clang. 
> I'm using the libclang functions clang_tokenize and clang_annotateTokens to create hyperlinked and syntax highlighted 
> output for the source files processed by doxygen.
> 
> So far it works quite well, but when the source file contains a tab character this seems to be counted as one character, causing
> the output to be misaligned.
> 
> Is there some way to configure the number of spaces in a tab? or is there a way to replace tabs by spaces before sending the
> contents of a file to libclang, without first having to write the detabbed file to disk?
> 
> Any help is appreciated.
> 
> Regards,
>  Dimitri
> 
> 
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20130309/e391e733/attachment.html>