[PATCH] D98799: [UniqueLinkageName] Use consistent checks when mangling symbo linkage name and debug linkage name.

Hongtao Yu via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed May 12 10:00:35 PDT 2021


hoy added inline comments.


================
Comment at: clang/test/CodeGen/unique-internal-linkage-names-dwarf.c:34-39
+static int go(a) int a;
+{
+  return glob + a;
+}
+
+
----------------
aaron.ballman wrote:
> hoy wrote:
> > aaron.ballman wrote:
> > > dblaikie wrote:
> > > > hoy wrote:
> > > > > dblaikie wrote:
> > > > > > hoy wrote:
> > > > > > > dblaikie wrote:
> > > > > > > > aaron.ballman wrote:
> > > > > > > > > dblaikie wrote:
> > > > > > > > > > aaron.ballman wrote:
> > > > > > > > > > > dblaikie wrote:
> > > > > > > > > > > > hoy wrote:
> > > > > > > > > > > > > dblaikie wrote:
> > > > > > > > > > > > > > dblaikie wrote:
> > > > > > > > > > > > > > > hoy wrote:
> > > > > > > > > > > > > > > > dblaikie wrote:
> > > > > > > > > > > > > > > > > hoy wrote:
> > > > > > > > > > > > > > > > > > dblaikie wrote:
> > > > > > > > > > > > > > > > > > > hoy wrote:
> > > > > > > > > > > > > > > > > > > > dblaikie wrote:
> > > > > > > > > > > > > > > > > > > > > hoy wrote:
> > > > > > > > > > > > > > > > > > > > > > dblaikie wrote:
> > > > > > > > > > > > > > > > > > > > > > > hoy wrote:
> > > > > > > > > > > > > > > > > > > > > > > > dblaikie wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > hoy wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > dblaikie wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > > Does this need to be down here? Or would the code be a well exercised if it was up next to the go declaration above?
> > > > > > > > > > > > > > > > > > > > > > > > > > Yes, it needs to be here. Otherwise it will just like the function `bar` above that doesn't get a uniquefied name. I think moving the definition up to right after the declaration hides the declaration.
> > > > > > > > > > > > > > > > > > > > > > > > > Not sure I follow - do you mean that if the go declaration and go definition were next to each other, this test would (mechanically speaking) not validate what the patch? Or that it would be less legible, but still mechanically correct?
> > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > I think it would be (assuming it's still mechanically correct) more legible to put the declaration next to the definition - the comment describes why the declaration is significant/why the definition is weird, and seeing all that together would be clearer to me than spreading it out/having to look further away to see what's going on.
> > > > > > > > > > > > > > > > > > > > > > > > When the `go` declaration and `go` definition were next to each other, the go function won't get a uniqufied name at all. The declaration will be overwritten by the definition. Only when the declaration is seen by others, such the callsite in `baz`, the declaration makes a difference by having the callsite use a uniqufied name.
> > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > Ah! Interesting, good to know. 
> > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > Is that worth supporting, I wonder? I guess it falls out for free/without significant additional complexity. I worry about the subtlety of the additional declaration changing the behavior here... might be a bit surprising/subtle. But maybe no nice way to avoid it either.
> > > > > > > > > > > > > > > > > > > > > > It would be ideal if user never writes code like that. Unfortunately it exists with legacy code (such as mysql). I think it's worth supporting it from AutoFDO point of view to avoid a silent mismatch between debug linkage name and real linkage name.
> > > > > > > > > > > > > > > > > > > > > Oh, I agree that we shouldn't mismatch debug info and the actual symbol name - what I meant was whether code like this should get mangled or not when using unique-internal-linkage names.
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > I'm now more curious about this:
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > When the `go` declaration and `go` definition were next to each other, the go function won't get a uniqufied name at all.
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > This doesn't seem to happen with the `__attribute__((overloadable))` attribute, for instance - so any idea what's different about uniquification that's working differently than overloadable?
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > > > > > > > > $ cat test.c
> > > > > > > > > > > > > > > > > > > > > __attribute__((overloadable)) static int go(a) int a; {
> > > > > > > > > > > > > > > > > > > > >   return 3 + a;
> > > > > > > > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > > > > > > > > void baz() {
> > > > > > > > > > > > > > > > > > > > >   go(2);
> > > > > > > > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > > > > > > > > $ clang-tot test.c -emit-llvm -S -o - | grep go
> > > > > > > > > > > > > > > > > > > > >   %call = call i32 @_ZL2goi(i32 2)
> > > > > > > > > > > > > > > > > > > > > define internal i32 @_ZL2goi(i32 %a) #0 {
> > > > > > > > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > > > > > > > Good question. I'm not sure what's exactly going on but it looks like with the overloadable attribute, the old-style definition is treated as having prototype. But if you do this:
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > > > > > > > __attribute__((overloadable)) 
> > > > > > > > > > > > > > > > > > > > void baz() {}
> > > > > > > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > > > > > > > then there's the error:
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > > > > > > > error: 'overloadable' function 'baz' must have a prototype
> > > > > > > > > > > > > > > > > > > > void baz() {
> > > > > > > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > `void baz() {` does not come with a prototype. That's for sure.  Sounds like `int go(a) int a {;` can have a prototype when it is loadable. I'm wondering why it's not always treated as having prototype, since the parameter type is there.
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > Yeah, that seems like that divergence be worth understanding (& if possible fixing/avoiding/merging). Ensuring these features don't have subtle divergence I think will be valuable to having a model that's easy to explain/understand/modify/etc.
> > > > > > > > > > > > > > > > > > I took another look. I think the divergence comes from `getAs<FunctionProtoType>` vs `hasPrototype`. The debug data generation uses `hasPrototype` while `getAs<FunctionProtoType>` is used as overloadable attribute processing as long as unique linkage name processing before this change. More specifically, the following function definition is represented by `FunctionProtoType`  while it does not `hasPrototype`.
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > > > > > static int go(a) int a; {
> > > > > > > > > > > > > > > > > >   return 3 + a;
> > > > > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > I was trying to have `CGDebugInfo` to check `FunctionProtoType`  instead of `hasPrototype`. While it works for the code pattern in discussion, it also breaks other tests including objectC tests. More investigation is needed to understand what each term really means.
> > > > > > > > > > > > > > > > > Are you undertaking that investigation? It'd be good to address this divergence if possible.
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > (@aprantl or @rsmith maybe you know something about this ObjC thing? )
> > > > > > > > > > > > > > > > Haven't figured out anything useful yet. As far as I can tell, the debug info generation code is shared between C++ and ObjC. Using `getAs<FunctionProtoType>` works for C++ but not for ObjectC where it crashes when computing a mangled name for something like 
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > > > void test() {
> > > > > > > > > > > > > > > >   __block A a;
> > > > > > > > > > > > > > > >   ^{ (void)a; };
> > > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > Below are the failing tests which are all like that:
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > >   Clang :: CodeGenCXX/cp-blocks-linetables.cpp
> > > > > > > > > > > > > > > >   Clang :: CodeGenCXX/debug-info-block-invocation-linkage-name.cpp
> > > > > > > > > > > > > > > >   Clang :: CodeGenCXX/debug-info-blocks.cpp
> > > > > > > > > > > > > > > >   Clang :: CodeGenObjCXX/nested-ehlocation.mm
> > > > > > > > > > > > > > > >   Clang :: CodeGenObjCXX/property-objects.mm
> > > > > > > > > > > > > > > >   Clang :: CodeGenObjCXX/synthesized-property-cleanup.mm
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > cc @bruno 
> > > > > > > > > > > > > > > Ping on this - anyone got a chance to take a look? It'd be great to avoid this subtle inconsistency.
> > > > > > > > > > > > > > Ping again
> > > > > > > > > > > > > I tried a different route instead of using `getAs<FunctionProtoType>` in debug info generation which beaks the blocks function and objectC cases. Since the problem here is that the old-C function (`bar` in the test case) is not considered `hasPrototype`, I tried to unify `isKNRPrototype` and `hasPrototype` so that old-C functions are considered `hasPrototype`. It works for the name mangler but it breaks other places where `isKNRPrototype` should be excluded from `hasPrototype`.
> > > > > > > > > > > > Hrm - I'd really like to get to the bottom of this, but not sure who to pull in.
> > > > > > > > > > > > 
> > > > > > > > > > > > @aaron.ballman - do you know who might have some idea of how these different old KNR C declarations work, and how this code might be made more robust?
> > > > > > > > > > > Ugh, prototypes. They're not particularly well specified in the C standard IMHO. In C, a function with a prototype is one that declares the types of its parameters (C17 6.2.1p2) which is further clarified to be a function type with a parameter type list explicitly (C17 6.2.7p3, 6.9.1p7). However, the very end of 6.9.1p7 goes on to say that once you see the definition of the function, you know about its parameter type information, but it doesn't clarify whether this means the function now has a prototype or not.
> > > > > > > > > > > 
> > > > > > > > > > > The result of this is that:
> > > > > > > > > > > ```
> > > > > > > > > > > void f();
> > > > > > > > > > > void call_it_once(void) { f(1.2f); }
> > > > > > > > > > > 
> > > > > > > > > > > void f(a) float a; {}
> > > > > > > > > > > void call_if_twice(void) { f(1.2f); }
> > > > > > > > > > > ```
> > > > > > > > > > > in `call_it_once`, the argument is promoted to a double, while in `call_it_twice`, the argument is not. I suspect we're hitting another variation of this confusing behavior here.
> > > > > > > > > > > 
> > > > > > > > > > @aaron.ballman Thanks for taking a look.
> > > > > > > > > > 
> > > > > > > > > > Do you know if/how this code could be phrased so that this code:
> > > > > > > > > > ```
> > > > > > > > > > static int go(int);
> > > > > > > > > > 
> > > > > > > > > > void baz() {
> > > > > > > > > >   foo();
> > > > > > > > > >   bar(1);
> > > > > > > > > >   go(2);
> > > > > > > > > > }
> > > > > > > > > > 
> > > > > > > > > > static int go(a) int a;
> > > > > > > > > > {
> > > > > > > > > >   return glob + a;
> > > > > > > > > > }
> > > > > > > > > > ```
> > > > > > > > > > 
> > > > > > > > > > Could test some property of `go` at the function call site that would be consistent whether the definition of `go` came before or after the call site? It seems currently this code behaves differently depending on that order and I think that's a bit of a sharp corner it'd be good not to have, if there's a tidier/more consistent way to phrase the code.
> > > > > > > > > To be honest, I wasn't aware this code was even valid (where you mix and match between identifier lists and parameter type lists), so that's neat.
> > > > > > > > > 
> > > > > > > > > I might be confused, but in my tests, the behavior of `collectFunctionDeclProps()` is the same regardless of order, but the calls to `isUniqueInternalLinkageDecl()` are different. Given an invocation of:  `-cc1 -triple x86_64-unknown-linux -debug-info-kind=limited -dwarf-version=4 -funique-internal-linkage-names -emit-llvm -o - test.c` with test.c as:
> > > > > > > > > ```
> > > > > > > > > static int go(int);
> > > > > > > > > 
> > > > > > > > > static int go(a) int a;
> > > > > > > > > {
> > > > > > > > >   return 1 + a;
> > > > > > > > > }
> > > > > > > > > 
> > > > > > > > > void baz() {
> > > > > > > > >   go(2);
> > > > > > > > > }
> > > > > > > > > ```
> > > > > > > > > I see `isUniqueInternalLinkageDecl()` gets called:
> > > > > > > > >   * once for `go` with no prototype
> > > > > > > > >   * once for `baz` with no prototype
> > > > > > > > > 
> > > > > > > > > and `collectFunctionDeclProps()` gets called:
> > > > > > > > >   * once for `baz` with no prototype
> > > > > > > > >   * once for `go` with a prototype
> > > > > > > > > 
> > > > > > > > > The end result is that I see `!13 = distinct !DISubprogram(name: "go", scope: !8, file: !8, line: 3, type: !14, scopeLine: 4, flags: DIFlagPrototyped, spFlags: DISPFlagLocalToUnit | DISPFlagDefinition, unit: !0, retainedNodes: !2)` in the output.
> > > > > > > > > 
> > > > > > > > > However, with test.c as:
> > > > > > > > > ```
> > > > > > > > > static int go(int);
> > > > > > > > > 
> > > > > > > > > void baz() {
> > > > > > > > >   go(2);
> > > > > > > > > }
> > > > > > > > > 
> > > > > > > > > static int go(a) int a;
> > > > > > > > > {
> > > > > > > > >   return 1 + a;
> > > > > > > > > }
> > > > > > > > > ```
> > > > > > > > > I see `isUniqueInternalLinkageDecl()` gets called:
> > > > > > > > >   * once for `baz` with no prototype
> > > > > > > > >   * once for `go` with a prototype
> > > > > > > > >   * another one for `go` with a prototype
> > > > > > > > > 
> > > > > > > > > and `collectFunctionDeclProps()` gets called:
> > > > > > > > >   * once for `baz` with no prototype
> > > > > > > > >   * once for `go` with a prototype
> > > > > > > > > 
> > > > > > > > > The end result is that I see `!13 = distinct !DISubprogram(name: "go", linkageName: "_ZL2goi.__uniq.39558841650144213141281977295187289852", scope: !8, file: !8, line: 7, type: !14, scopeLine: 8, flags: DIFlagPrototyped, spFlags: DISPFlagLocalToUnit | DISPFlagDefinition, unit: !0, retainedNodes: !2)`
> > > > > > > > > 
> > > > > > > > > When I change `isUniqueInternalLinkageDecl()` to use `!FD->getCanonicalDecl()->hasPrototype()`, I get the same behavior with either ordering. When I run the full test suite with that change, I get no test failures, so that may be a reasonable fix worth investigating (I'm not super familiar with the ins and outs of name mangling and whether this change would be correct or not).
> > > > > > > > Thanks so much, @aaron.ballman that does sound exactly like it summarizes the situation and the suggestion sounds like it could be the right thing.
> > > > > > > > 
> > > > > > > > @hoy does that all make sense to you/could you try @aaron.ballman's suggested change/fix?
> > > > > > > Thanks for doing the experiment @aaron.ballman . Actually for the test in this diff, we would like the function `bar` to have a unique linkage name. The suggested change doesn't seem to fix that.
> > > > > > > 
> > > > > > > ```
> > > > > > > // bar should not be given a uniquefied name under -funique-internal-linkage-names, 
> > > > > > > // since it doesn't come with valid prototype.
> > > > > > > static int bar(a) int a;
> > > > > > > {
> > > > > > >   return glob + a;
> > > > > > > }
> > > > > > > 
> > > > > > > ```
> > > > > > I assume that comment was more written to describe existing practice than some intentional approach to dealing with a function like this, yeah?
> > > > > > 
> > > > > > The overloadable attribute seems to be able to mangle this function correctly - so I think that was my whole concern - that unique internal linkage names should, ideally, treat things the same as overloadable unless there's a reason these things are really different - which I don't know of any reason that they are.
> > > > > > 
> > > > > > So this looks like it fixes that gap - by enabling unique internal linkage names to mangle this case the same way overloadable does?
> > > > > Sorry for not making it clear. The suggested fix does not mangle this case (function `bar`) while the overloadable attributes is able to mangle it.
> > > > Ah, right - thanks for clarifying/sorry for my misunderstanding!
> > > `bar` is never given a prototype, so I can see why it wouldn't mangle for you. Perhaps it mangles for `overloadable` because of `ItaniumMangleContextImpl::shouldMangleCXXName()` checking for the attribute and returning `true`?
> > Yes, and I think `shouldMangleCXXName` returns true because of `FD->getType()->getAs<FunctionProtoType>()` is true in the `overloadable` case. I guess I still don't fully understand the subtle diversion between `FD->getType()->getAs<FunctionProtoType>()` and `FD->hasPrototype()`.
> > Yes, and I think shouldMangleCXXName returns true because of FD->getType()->getAs<FunctionProtoType>() is true in the overloadable case. 
> 
> I don't think that's correct -- I think it returns true because of this: https://github.com/llvm/llvm-project/blob/main/clang/lib/AST/ItaniumMangle.cpp#L657
> 
> > I guess I still don't fully understand the subtle diversion between FD->getType()->getAs<FunctionProtoType>() and FD->hasPrototype().
> 
> `FD->getType()->getAs<FunctionProtoType>()` is checking whether the given instance of the declaration has a type with a prototype. `FD->hasPrototype()` looks at whether the given instance has *or inherits* a prototype. So `hasPrototype()` is looking in more places for the prototype.
Thanks for pointing it out. I meant to say `overloadable` is only allowed on functions with `getAs<FunctionProtoType>()` but not `hasPrototype`:  https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaDecl.cpp#L9825

On the contrary, `isUniqueInternalLinkageDecl` and `CGDebugInfo` rely on `hasPrototype`. This subtle diversion causes function `bar` to be mangled differently. Even with `overloadable`, function bar has a different symbol linkage name and debug linkage name:


```
_attribute__((overloadable))
static int bar(a) int a;
{
  return glob + a;
}

results in:

define internal i32 @_ZL3bari.__uniq.95397005697148547017947938379908873441(i32 %a)

but not a debug linkage name
!24 = distinct !DISubprogram(name: "bar", scope: !6, file: !6, line: 19, type: !25, scopeLine: 20, spFlags: DISPFlagLocalToUnit | DISPFlagDefinition, unit: !2, retainedNodes: !4)
```




Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98799/new/

https://reviews.llvm.org/D98799



More information about the cfe-commits mailing list