[clang] [llvm] [CloneFunction][DebugInfo] Avoid cloning DILocalVariables of inlined functions (PR #75385)

Tue Oct 1 11:26:17 PDT 2024

jmorse wrote:

[This keeps on slipping to the back of my TODO list,]

I've been enlightened by the comments on #68929 about ODR rules, and that there isn't a violation in the example; it does at least exercise the code path of interest, my summary of which is that the ODR-uniquing of types is happening at such a low of a level that it causes surprises and can't be easily fixed. Here's a more well designed reproducer:

    inline int foo() {
      class bar {
      private:
        int a = 0;
      public:
        int get_a() { return a; }
      };

      static bar baz;
      return baz.get_a();
    }

    int a() {
      return foo();
    }

Compile and link this similar to above:
    clang a.cpp  -o b.ll -emit-llvm -S -g -c -O2
    clang b.cpp  -o b.ll -emit-llvm -S -g -c -O2
    llvm-link a.ll b.ll -o c.ll -S
    llc c.ll -o out.o -filetype=obj
    <boom>

Where b.cpp is a copy of the file above with the function renamed from 'a' to 'b' to ensure there aren't multiple conflicting definitions. In this code, we inline the body of "foo" into the 'a' and 'b' functions, and furthermore we inline the get_a method of foo::bar too. In each of the compilation units, this leads to a chain of lexical scopes for the most deeply inlined instruction of:
 * get_a method,
 * foo::bar class
 * foo function
 * 'a' or 'b' function.

The trouble comes when the two modules are linked together: the two collections of DILocations / DILexicalScopes / DISubprograms describing source-locations in each module are distinct and kept separate through linking. However the DICompositeType for foo::bar is unique'd based on its name, and its "scope" field will point into one of the metadata collections. Thus, where we used to have two distinct chains of lexical scopes we've now got a tree of them, joining at the unique'd DICompositeType, and llc is not prepared for this.

I don't know that this is a bug, more of a design mismatch: most of the rest of LLVM is probably OK with having the lexical-scope chain actually being a tree, given that it only ever looks up it. However LexicalScopes does a top down exploration of a function looking for lexical scopes, and is then surprised when it finds different scopes looking from the bottom up. We could adjust it to search upwards for more lexical scopes (it already does that for block-scopes), but then I imagine we would produce very confusing DWARF that contained two Subprogram scopes for the same function.

There's also no easy way of working around this in metadata: we can't describe any other metadata relationship because it's done at such a low level, and we can't selectively not-ODR-unique DICompositeTypes that are inside functions because the lexical scope metadata might not have been loaded yet, so can't be examined.

An immediate fix would be to not set the "identifier" field for the DICompositeType when it's created if it's inside a function scope to avoid ODRUniqing. I've only got a light understanding of what the identifier field is for, so there might be unexpected consequences, plus there'll be a metadata/DWARF size cost to that.

https://github.com/llvm/llvm-project/pull/75385