[LLVMdev] Mangling of UTF-8 characters in symbol names

Fri Mar 30 18:17:03 PDT 2012

On Fri, Mar 30, 2012 at 15:22, Eli Friedman <eli.friedman at gmail.com> wrote:

> On Fri, Mar 30, 2012 at 12:12 PM, Sean Hunt <scshunt at csclub.uwaterloo.ca>
> wrote:
> > Why is it that high (>127) bytes in symbol names get mangled by LLVM into
> > _XX_, where XX is the hex representation of the character? Is this
> required
> > by ELF or some similar standard? This behavior is inconsistent with GCC.
>
> I think it's just so that we have a way to actually write out the
> symbol into the assembly file.  What does gcc do?
>
> -Eli
>
>
It emits the high bits literally. The consequence is that UTF-8-encoded
identifiers come out in UTF-8:

scshunt at natural-flavours:~$ gcc -fextended-identifiers -std=c99 -x c -c -o
test.o -
int i\u03bb;
scshunt at natural-flavours:~$ nm test.o
00000004 C iλ
scshunt at natural-flavours:~$

As you can see, the nm output includes the literal lambda.

Sean
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120330/af3e3f76/attachment.html>