[cfe-dev] UTF-8 vs. UTF-16 code locations
Milian Wolff via cfe-dev
cfe-dev at lists.llvm.org
Mon Jan 25 12:18:47 PST 2016
On Montag, 25. Januar 2016 20:39:44 CET Joachim Durchholz via cfe-dev wrote:
> Am 25.01.2016 um 19:23 schrieb Milian Wolff via cfe-dev:
> > This is done without ever loading any file in an editor. But we do run a
> > lot of clang_parseTranslationUnit2 calls which will internally open files
> > from disk. Then we visit the AST and get e.g. the position for a class
> > declaration. In order to convert that position, assuming the file is
> > UTF-8 encoded, I want to translate it to a UTF-16 position.
>
> Can't you convert to UTF-16 during load? Then you don't need to
> translate at all.
> I'm under the impression that you are keeping an UTF-8 data blob in an
> environment that mostly talks UTF-16; in that case, the cleanest
> solution would be to have the data blob in UTF-16, too. Of course I
> don't know how much of your code base you'd have to touch to change
> that, this could be quite nasty or surprisingly easy.
What data blob are you referring to? I have the feeling we are talking past
each other in this discussion ;-)
on one hand I have:
for every file in given directory
call clang_parseTranslationUnit
traverse resulting AST
for every interesting cursor
store range of this cursor
The data blob we cache is a range[start(line, column), end(line, column)]. The
large code base expect this to be UTF-16 column offsets. Assuming the file is
encoded in UTF-8 on-disk then this is what I'll get from clang-c. For that
reason I'd like to convert it at this point. An API in clang-c for efficient
access to the underlying UTF-8 buffer of a given CXFile would help a lot for
that purpose (and in other scenarios we currently (ab)use clang_tokenize to
stringify a range).
So what I'm asking, again, is whether an API such as the following would be
acceptable:
CXString clang_getRangeSpelling(CXSourceRange range);
Thanks
--
Milian Wolff
mail at milianw.de
http://milianw.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160125/0f7862b4/attachment.sig>
More information about the cfe-dev
mailing list