[cfe-dev] Russian characters lead to wrong SourceLocation start/end!

Jordan Rose jordan_rose at apple.com
Tue Jan 7 09:14:05 PST 2014


Source locations are measured in bytes, not in characters, with the assumption that most editors are better-equipped to deal with byte offsets. So I think this is correct.

Jordan


On Jan 7, 2014, at 0:36 , Anton Smirnov <dev at antonsmirnov.name> wrote:

> I've just extracted/build and installed llvm, clang, compiler-rt etc for branches/release_34 and this issue relates to 3.4 release too.
> IMHO it's pretty important issue that should be fixed before 3.4 release..
> 
> Regards, Anton.
> 
> 
> 2014/1/7 Anton Smirnov <dev at antonsmirnov.name>
> bug report:
> http://llvm.org/bugs/show_bug.cgi?id=18405
> 
> 
> 
> 
> 2014/1/7 Anton Smirnov <dev at antonsmirnov.name>
> Hello.
> 
> I've just found that if UnsavedFile has russian characters in it's content, SourceLocation's start/end is calculated wrong - each symbol length is calculated to be 2 bytes instead of 1. This leads to wrong end and all other tokens locatoin while clang_tokenize().
> 
> I've check it on 3.3 on mac and linux (android via JNI) since i can't see 3.4 release (though it was scheduled to be released in december) notes and downloads.
> 
> Any confirmation/suggestions?
> 
> PS. I've also posted bug report but i can't see it yet.
> 
> Regards, Anton.
> 
> 
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20140107/831dbe1c/attachment.html>


More information about the cfe-dev mailing list