[LLVMdev] [cfe-dev] A problem with names that can not be demangled.

Srivastava, Sunil sunil_srivastava at playstation.sony.com
Tue Apr 14 13:51:45 PDT 2015


Hi David,

Thanks for adding Lang and other mailing lists.


Ø  Is there any basis/reason to believe that the .X suffix is a better, more principled one

Ø  than straight X? Is that documented somewhere as a thing the demangling tools

Ø  will ignore?
I don’t know of any documentation but the existing demangling tools, such as c++filt on linux, already stop at ‘.’.  So there is precedence.

Gcc bugzillas refer to this point. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40831#c10  for an example.

Moreover, the issue exists in C as well, where the debugger will have no way of distinguishing Bye from Bye1. They may still need some work for figuring out Bye.1, but at least that is doable.

Sunil

From: David Blaikie [mailto:dblaikie at gmail.com]
Sent: Tuesday, April 14, 2015 12:44 PM
To: Srivastava, Sunil
Cc: cfe-dev at cs.uiuc.edu; LLVM Developers Mailing List; Lang Hames
Subject: Re: [cfe-dev] A problem with names that can not be demangled.

Adding llvm-dev as that might be a more suitable audience for this discussion.

(& I know Lang's been playing around with the same problem in the Orc JIT, so adding him too)

Is there any basis/reason to believe that the .X suffix is a better, more principled one than straight X? Is that documented somewhere as a thing the demangling tools will ignore?

On Tue, Apr 14, 2015 at 12:06 PM, Srivastava, Sunil <sunil_srivastava at playstation.sony.com<mailto:sunil_srivastava at playstation.sony.com>> wrote:
Hi,

We are running into a problem created by renaming of static symbols by llvm-link.  It first
showed up using LTO, but we can illustrate this by using llvm-link as well.

Say we have two files with the same named static symbol Bye

--------------- t1.cpp ---------
static void Bye(int* ba1) { ba1[0] /= ba1[2] - 2; }
void main_a( int* inB) { void (*func)(int*) = Bye; func(inB); }
--------------- t2.cpp ---------
static void Bye(int* ba1) { ba1[0] *= ba1[2] + 2; }
void main_b( int* inB) { void (*func)(int*) = Bye; func(inB+1); }

--------- cmd sequence -------
$ clang++ -c -emit-llvm t1.cpp -o t1.bc
$ clang++ -c -emit-llvm t1.cpp -o t2.bc
$ llvm-link t1.bc t2.bc -o t23.bc
$ clang -c t23.bc
$ nm t23.o

t1.o and t2.o have the same named function “_ZL3ByePi”. In order to distinguish them,
one gets a ‘1’ appended to it, making it  “_ZL3ByePi1”.

While the code is all correct, the problem is that this modified name cannot be demangled.

That is what I am trying to fix.

In similar situations gcc appends a ‘.’ before appending the discriminating number, making “_ZL3ByePi.1”

The following change in lib/IR/ValueSymbolTable.cpp seems to fix this problem.

------------ start diff -------------------
@@ -54,5 +54,5 @@ void ValueSymbolTable::reinsertValue(Value* V) {
     // Trim any suffix off and append the next number.
     UniqueName.resize(BaseSize);
-    raw_svector_ostream(UniqueName) << ++LastUnique;
+    raw_svector_ostream(UniqueName) <<  "."  << ++LastUnique;

     // Try insert the vmap entry with this suffix.
-------------- end diff ---------------------

However it causes 60 test failures. These are tests where some names that are expecting
to get a plain numeric suffix now have a ‘.’ before it. These are all local symbols, so I think
the generated code will always be correct, but the tests as written do not pass. For
example, take test/CodeGen/ARM/global-merge-addrspace.ll

; RUN: llc < %s -mtriple=thumb-apple-darwin -O3 | FileCheck %s
; Test the GlobalMerge pass. Check that the pass does not crash when using
; multiple address spaces.
; CHECK: _MergedGlobals:
@g1 = internal addrspace(1) global i32 1
@g2 = internal addrspace(1) global i32 2
; CHECK: _MergedGlobals1:
@g3 = internal addrspace(2) global i32 3
@g4 = internal addrspace(2) global i32 4

With my change, the symbol is named MergedGlobals.1, hence it fails this test.

I could change these 60 tests to match the new behavior. That will fix these 60 failures.
However, I do have a concern that there may be other places in llvm that expect the
names to be pure identifiers. Adding a ‘.’ may cause them to fail. No such failure has been
seen in running the whole clang test, but the concern is still there.

I should note that even local symbols are treated similarly, so for example, a parameter
named ‘str’ becomes ‘str.1’ with my change, instead of ‘str1’ currently (an actual
example from a test).

Alternatively, I could try to limit my change to just mangled names.

Any suggestion about how this should be fixed ?

There is another similar change about 40 lines below in ValueSymbolTable::createValueName().
That is not needed to fix this particular problem, but looks similar, so perhaps should be treated
similarly for consistency. It causes 66 more failures of the same nature though.

Thanks

Sunil Srivastava
Sony Computer Entertainment


_______________________________________________
cfe-dev mailing list
cfe-dev at cs.uiuc.edu<mailto:cfe-dev at cs.uiuc.edu>
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150414/808e5df0/attachment.html>


More information about the llvm-dev mailing list