[lldb-dev] Editline Rewrite : issues surround wide character handling on different platforms

Shawn Best sbest at blueshiftinc.com
Wed Oct 29 12:31:24 PDT 2014


Libedit internally uses wchar_t to handle wide characters.  For the 
types of things libedit does, I think a wchar_t is better suited than an 
array of utf8 coded bytes.  The translations in question are for getting 
data in and out of libedit.

This means that support for extended characters in the command line 
history will be dependent on having <codecvt> support and a libedit 
built with wchar-t support, which, AFAIK, is only OSX.

Currently, I am reworking the patch, so it works with either libedit's 
char or wchar_t functions.  This is a compile time decision.

If we want wide character support on other platforms down the road, it 
would make sense to bring the libedit functions into lldb.  We can add a 
custom wchar to utf8 translations if  gcc still does not support <codecvt>.

On 10/29/2014 11:38 AM, Zachary Turner wrote:
> If you're storing UTF8 anyway, why not just use regular character 
> strings?  Doesn't it defeat the purpose of using UTF8 if you're 
> combining it with a character type that isn't 1 byte?
>
> On Wed Oct 29 2014 at 11:27:29 AM Kate Stone 
> <katherine_stone at apple.com <mailto:katherine_stone at apple.com>> wrote:
>
>     On Oct 28, 2014, at 1:55 PM, Zachary Turner <zturner at google.com
>     <mailto:zturner at google.com>> wrote:
>
>>     On Tue Oct 28 2014 at 1:46:26 PM Vince Harron <vharron at google.com
>>     <mailto:vharron at google.com>> wrote:
>>
>>
>>         > - rework the Editline rewrite, so it either uses standard 8
>>         bit chars, or wchar_t/utf8 depending on the platform.  This
>>         would be conditionally built depending on the platform.
>>
>>         This would be my favorite option if possible.  wchar_t never
>>         really took roots in Linux AFAIK.
>>
>>
>>     Also probably the best option for Windows, although it's worth
>>     pointing out that at least for now, most other stuff in LLDB
>>     doesn't really use wide character strings either, so char would
>>     be the path of least resistance for Windows right now.
>
>     With the Editline rewrite I made the explicit decision to insulate
>     the rest of LLDB from wide characters and strings by encoding
>     everything as UTF8.  I agree that reverting to char-only input is
>     a perfectly reasonable solution for platforms that don't yet
>     include wchar-aware libedit implementations.
>
>     Kate Stone k8stone at apple.com <mailto:k8stone at apple.com>
>      Xcode Runtime Analysis Tools
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20141029/95869534/attachment.html>


More information about the lldb-dev mailing list