[llvm-dev] [lldb-dev] Adding DWARF5 accelerator table support to llvm

Tue Jan 30 07:49:11 PST 2018

On 30 January 2018 at 15:41, Adrian Prantl <aprantl at apple.com> wrote:
>
>
>> On Jan 30, 2018, at 7:35 AM, Pavel Labath <labath at google.com> wrote:
>>
>> Hello all,
>>
>> I am looking for feedback regarding implementation of the case folding
>> algorithm for .debug_names hashes.
>>
>> Unlike the apple tables, the .debug_names hashes are computed from
>> case-folded names (to enable case-insensitive lookups for languages
>> where that makes sense). The dwarf5 document specifies that the case
>> folding should be done according the the "Caseless matching" Section
>> of the Unicode standard (whose implementation is basically a long list
>> of special cases). While certainly possible, implementing this would
>> be much more complicated (and would probably make the code a bit
>> slower) than a simple tolower(3) call. And the benefits of this are
>> not really clear to me.
>
> Assuming a UTF-8 encoding, will tolower(3) destroy any non-ASCII characters in the process? In Swift, for example, we allow a wide range of unicode characters in identifiers and I want to make sure that this doesn't cause any problems.
>

I'm not sure what it will do out-of-the-box, but I could certainly
implement it such that it does not touch the fancy characters.

However, if we already have unicode characters in the input, then it
may make sense to go all the way and implement the full folding
algorithm. Because, once we start producing hashes like this, it will
be hard to switch to being fully standard-compliant (as that would
invalidate the existing hashes).

But the question then is: can I assume the input names will be unicode
(w/utf8 encoding)?