[PATCH] D93747: Rename debug linkage name with -funique-internal-linkage-names

Wed Jan 6 00:08:35 PST 2021

dblaikie added a comment.

In D93747#2481383 <https://reviews.llvm.org/D93747#2481383>, @hoy wrote:

> In D93747#2481304 <https://reviews.llvm.org/D93747#2481304>, @tmsriram wrote:
>
>> In D93747#2481223 <https://reviews.llvm.org/D93747#2481223>, @tmsriram wrote:
>>
>>> In D93747#2481163 <https://reviews.llvm.org/D93747#2481163>, @dblaikie wrote:
>>>
>>>> In D93747#2481095 <https://reviews.llvm.org/D93747#2481095>, @hoy wrote:
>>>>
>>>>> In D93747#2481073 <https://reviews.llvm.org/D93747#2481073>, @dblaikie wrote:
>>>>>
>>>>>> Suggesting that gdb isn't using the DW_AT_name at all for "break <function name>" but instead demangling and stripping off the extras from the linkage name, and since it can't demangle this uniquified name (unlike the mangled name used when using the overloadable attribute) that degrades the debugger user experience? I'd have to test that in more detail/with some hand-hacked DWARF.
>>>>>
>>>>> Yes, I think in your case the linage name can be demangled by the debugger. In my previous experiment, the uniquefied names could not be demangled therefore I was not able to breakpoint.
>>>>
>>>> Ah, did some more testing. Seems it's because it isn't a classically mangled name.
>>
>> The simplest fix I can think of is to emit the base 10 number of the md5hash.  That would preserve all the existing uniqueness and be demangler friendly.  @hoy @dblaikie  WDYT?
>
> I think using the base 10 form of md5hash is a smart workaround. Thanks for the suggestion.

Sure - wouldn't mind having some wording from the Itanium ABI/mangling rules that explain/corroborate what we've seen from testing to ensure we are using it correctly and there aren't any other ways that might be more suitable, or lurking gotchas we haven't tested.

>>>> It might be possible for this uniquifying step to check if the name is mangled (does it start with "_Z") and if it isn't mangled, it could mangle it (check the length of the name, then "_Z<length><name>v") and add the unique suffix. That looks to me like it preserves the original debugging behavior?
>>>>
>>>> (has the problem that we don't actually know the mangling scheme at this point - we do know it up in clang (where it might be Itanium mangling or Microsoft mangling), might point again to the possibility this feature is more suitable to implement in the frontend rather than a middle end pass. And also the 'v' in the mangling is 'void return type', which is a lie - not sure if we could do something better there)
>
> Doing name unification in the frontend sounds like the ultimate solution and since the frontend has all the knowledge about the name mangler. I think for now we can go with the solution @tmsriram suggested, i.e., using the base 10 form of md5 hash.

Any general idea of how long "for now" would be? It doesn't seem like putting this off is going to make things especially better & seems like more bug fixes/changes/etc are being built around the solution as it is at the moment. So I'd be hesitant to put off this kind of restructuring too long.

>>>> I think it's pretty important that we don't break tradeoff debuggability for profile accuracy. It'll make for a difficult tradeoff for our/any users.
>>>
>>> Agreed, I didn't expect this.
>>>
>>>> (side note: using the original source file name seems problematic - I know in google at least, the same source file is sometimes built into more than one library/form with different preprocessor defines, so this may not produce as unique a name as you are expecting?)
>>>
>>> It was a best effort and I think the hashing can be improved by using more signals other than just the module name if needed.  For hard data though, this does significantly improve performance which clearly comes from better profile attribution so it does something.

I'm OK with the idea that this helped the situation - but it doesn't seem very principled/rigorous. Rather than a sliding scale of "better" I think it'd be worth considering in more detail what would be required to make this correct.

Using the object file name may be more robust - also not perfect for all build systems (one could imagine a distributed build system that /might/ be able to get away with having the distributed builders always build a file named x.o - only to have the distribution system rename the file to its canonical name on the main machine before linking occurs - but I think that's not how most distributed build systems work, and most build systems provide a unique object file name across a build?) but maybe moreso than using the input file name? (at least I think for google/blaze the object filename is genuinely unique)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93747/new/

https://reviews.llvm.org/D93747