[PATCH] D96045: [llvm-dwarfdump][locstats] Unify handling of inlined vars with no loc
Djordje Todorovic via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Apr 22 00:21:43 PDT 2021
djtodoro added a comment.
In D96045#2705748 <https://reviews.llvm.org/D96045#2705748>, @dblaikie wrote:
> In D96045#2704967 <https://reviews.llvm.org/D96045#2704967>, @djtodoro wrote:
>
>> In D96045#2703754 <https://reviews.llvm.org/D96045#2703754>, @dblaikie wrote:
>>
>>> In D96045#2688579 <https://reviews.llvm.org/D96045#2688579>, @djtodoro wrote:
>>>
>>>> In D96045#2688025 <https://reviews.llvm.org/D96045#2688025>, @dblaikie wrote:
>>>>
>>>>> In D96045#2685000 <https://reviews.llvm.org/D96045#2685000>, @djtodoro wrote:
>>>>>
>>>>>> In D96045#2684570 <https://reviews.llvm.org/D96045#2684570>, @dblaikie wrote:
>>>>>>
>>>>>>> In D96045#2678117 <https://reviews.llvm.org/D96045#2678117>, @rdhindsa wrote:
>>>>>>>
>>>>>>>> After moving the variable InlinedFnsToBeProcessed to inside the for loop, I noticed a difference in two dwarfdump statistics for a quite large binary with fission=yes:
>>>>>>>>
>>>>>>>> #variables processed by location statistics
>>>>>>>> #variables with 0% of parent scope covered by DW_AT_location
>>>>>>>>
>>>>>>>> Since I am not very familiar with this code, I would request if you could investigate and apply the patch.
>>>>>>>
>>>>>>> Hmm, looking at this trying to reproduce the behavior makes me think this wouldn't be the right fix & the change in numbers is probably reflective of a bug in the fix - specifically in LTO situations (where the abstract origin of the function may be in a different CU than the one it's referenced from/inlined into).
>>>>>>>
>>>>>>> But there might be a reasonable alternative fix/improvement - pruning the contents of the list once it has been handled. Possibly sorting the list so it's easy to just visit the elements that need to.
>>>>>>
>>>>>> I have taken a look into this and come up with the similar conclusion.
>>>>>> This could be reproduced with the Split DWARF -- when building e.g. GDB project with -gsplit-dwarf option enabled, it triggers the issue. I was struggling to make a simple test case (from a high level) that meets all the requirements, but it is very hard. That is why I've sent the email: https://lists.llvm.org/pipermail/llvm-dev/2021-April/149735.html.
>>>>>
>>>>> Ah, hmm - why would this come up in Split DWARF, do you think? Split DWARF doesn't have any cross-CU references (the format has no way to encode them) so I'd expect no DIEs to be added to the InlinedFnsToBeProcessed list?
>>>>
>>>> The example I had in my mind is a bit different (I haven't looked into cross-cu support yet, but we should have covered it).
>>>
>>> Hmm, OK. If the intent is that this change isn't meant to work for/fix the LTO case - then changing the scope sounds like something worth doing immediately. (& the discussion below about Split DWARF sounds like it'd be a place where the current buggy code would exhibit different (incorrect) behavior). I can probably try to come up with a test case that demonstrates that buggy behavior.
>>>
>>>> Let me first explain two main variables here:
>>>>
>>>> `GlobalInlinedFnInfo` -- keeps track of `DW_TAG_subprogram` copies with `DW_AT_inline`, so when we face a `DW_TAG_inlined_subroutine` we know for how many variables we have zero % location coverage
>>>
>>> As an aside: This issue can come up for the concrete DW_TAG_subprogram that refers to the abstract origin, not only for the DW_TAG_inlined_subroutines that refer to the abstract origin. ie:
>>>
>>> $ cat test.cpp
>>> void f1();
>>> __attribute__((always_inline)) void f2() {
>>> int i;
>>> f1();
>>> }
>>> void f3() {
>>> f2();
>>> }
>>> blaikie at blaikie-linux2:~/dev/scratch$ clang++-tot -O3 test.cpp -c -g && llvm-dwarfdump-tot test.o | grep "DW_TAG\|abstract_origin\|DW_AT_name"
>>> 0x0000000b: DW_TAG_compile_unit
>>> DW_AT_name ("test.cpp")
>>> 0x0000002a: DW_TAG_subprogram
>>> DW_AT_abstract_origin (0x0000005b "_Z2f2v")
>>> 0x0000003d: DW_TAG_variable
>>> DW_AT_abstract_origin (0x00000067 "i")
>>> 0x00000042: DW_TAG_GNU_call_site
>>> DW_AT_abstract_origin (0x00000050 "_Z2f1v")
>>> 0x00000050: DW_TAG_subprogram
>>> DW_AT_name ("f1")
>>> 0x0000005b: DW_TAG_subprogram
>>> DW_AT_name ("f2")
>>> 0x00000067: DW_TAG_variable
>>> DW_AT_name ("i")
>>> 0x00000073: DW_TAG_base_type
>>> DW_AT_name ("int")
>>> 0x0000007a: DW_TAG_subprogram
>>> DW_AT_name ("f3")
>>> 0x00000093: DW_TAG_inlined_subroutine
>>> DW_AT_abstract_origin (0x0000005b "_Z2f2v")
>>> 0x000000a7: DW_TAG_GNU_call_site
>>> DW_AT_abstract_origin (0x00000050 "_Z2f1v")
>>>
>>> So currently LLVM still has the "emitting a DIE with only an abstract origin for variables that are optimized away" issue in the concrete out of line definition (0x2a/0x3d above) though it's been fixed in the inlined case (0x93 above). And this patch for statistics addresses the latter case (the inlined_subprogram) but doesn't address the former case (if LLVM were improved not to produce that basically dead concrete variable DIE).
>>
>> Oh, yes, I think the D54217 <https://reviews.llvm.org/D54217> have introduced a bug here (since we should consider dw_subprogram with abstract_origin as inlined function) and I guess we should make something like this:
>>
>> diff --git a/llvm/tools/llvm-dwarfdump/Statistics.cpp b/llvm/tools/llvm-dwarfdump/Statistics.cpp
>> index 3758a56da7a2..4d0cf6b5fc8d 100644
>> --- a/llvm/tools/llvm-dwarfdump/Statistics.cpp
>> +++ b/llvm/tools/llvm-dwarfdump/Statistics.cpp
>> @@ -462,9 +462,15 @@ static void collectStatsRecursive(DWARFDie Die, std::string FnPrefix,
>> return;
>>
>> // Handle any kind of lexical scope.
>> - const bool IsFunction = Tag == dwarf::DW_TAG_subprogram;
>> + const bool HasAbstractOriginAttr =
>> + Die.find(dwarf::DW_AT_abstract_origin) != None;
>> + const bool IsFunction =
>> + ((Tag == dwarf::DW_TAG_subprogram) && !HasAbstractOriginAttr);
>> const bool IsBlock = Tag == dwarf::DW_TAG_lexical_block;
>> - const bool IsInlinedFunction = Tag == dwarf::DW_TAG_inlined_subroutine;
>> + const bool IsInlinedFunction =
>> + (Tag == dwarf::DW_TAG_inlined_subroutine ||
>> + (Tag == dwarf::DW_TAG_subprogram && HasAbstractOriginAttr));
>> +
>> InlinedVarsTy InlinedVars;
>> // Get the vars of the inlined fn, so the locstats
>> // reports the missing vars (with coverage 0%).
>
> More or less - I think we'll need to find another name other than "inlined" for these "concrete and inlined instances with abstract origins", otherwise it's liable to be confusing to future readers/maintainers.
Here is the patch -- D101025 <https://reviews.llvm.org/D101025>.
>>>> `InlinedFnsToBeProcessed` -- doing recursive approach if we can firsly meet a `DW_TAG_inlined_subroutine` referencing a `DW_TAG_subprogram` copy with `DW_AT_inline` (since DWARF standard does not propose the order), this variable keeps the `DW_TAG_inlined_subroutine` DIEs that should be processed after we are finished with the dwarf tree
>>>>
>>>> Let us know consider a big project (let it be GDB) compiled with -gsplit-dwarf. Main executable will have debug_info section:
>>>>
>>>> $ llvm-dwarfdump gdb
>>>> 0x0000000b: DW_TAG_compile_unit
>>>> DW_AT_stmt_list (0x00000000)
>>>> DW_AT_comp_dir ("/gdb")
>>>> DW_AT_GNU_pubnames (true)
>>>> DW_AT_GNU_dwo_name ("gdb.dwo")
>>>> DW_AT_GNU_dwo_id (0xb8ed71b1bb823706)
>>>> DW_AT_low_pc (0x0000000000482370)
>>>> DW_AT_high_pc (0x000000000048239a)
>>>> DW_AT_GNU_addr_base (0x00000000)
>>>> 0x00000030: Compile Unit: length = 0x00000030, format = DWARF32, version = 0x0004, abbr_offset = 0x001a, addr_size = 0x08 (next unit at 0x00000064)
>>>>
>>>> 0x0000003b: DW_TAG_compile_unit
>>>> DW_AT_stmt_list (0x00000069)
>>>> DW_AT_comp_dir ("/gdb")
>>>> DW_AT_GNU_pubnames (true)
>>>> DW_AT_GNU_dwo_name ("amd64-tdep.dwo")
>>>> DW_AT_GNU_dwo_id (0x97c5a9e4ce1a43bf)
>>>> DW_AT_GNU_ranges_base (0x00000000)
>>>> DW_AT_low_pc (0x00000000004823a0)
>>>> DW_AT_high_pc (0x000000000048e982)
>>>> DW_AT_GNU_addr_base (0x00000018)
>>>> 0x00000064: Compile Unit: length = 0x0000002c, format = DWARF32, version = 0x0004, abbr_offset = 0x0037, addr_size = 0x08 (next unit at 0x00000094)
>>>>
>>>> 0x0000006f: DW_TAG_compile_unit
>>>> ....
>>>>
>>>> We will keep filling both of these variables (GlobalInlinedFnInfo and InlinedFnsToBeProcessed) for all the CUs, but there could be a situation where we have inlined_subroutine referencing a subprogram, but we accidently use variables info from subprogram from previous CU with the same DIE offset -- that is why we have different numbers for this case.
>>>
>>> Yep, that sounds reasonably buggy & in the absence of LTO/if this code isn't intended to handle the LTO case (which it seems it isn't) then moving the variables to the inner scope for now - pending work to address LTO - sounds right to me.
>>>
>>> Here's a test case for the scope change fix/bug with split-dwarf:
>>> a.cpp:
>>>
>>> __attribute__((optnone)) static void x() {
>>> }
>>> __attribute__((always_inline)) static void y() {
>>> int var;
>>> x();
>>> }
>>> void f1() {
>>> y();
>>> }
>>>
>>> b.cpp:
>>>
>>> __attribute__((optnone)) static void x() {
>>> }
>>> __attribute__((always_inline)) static void y() {
>>> int var;
>>> x();
>>> }
>>> void f2() {
>>> y();
>>> }
>>>
>>> c.cpp:
>>>
>>> void f1();
>>> void f2();
>>> int main() {
>>> f1();
>>> f2();
>>> }
>>>
>>> $ clang++ a.cpp b.cpp c.cpp -g -gsplit-dwarf -O1
>>> $ llvm-dwarfdump --statistics a.out
>>>
>>> (you could compile c.cpp without debug info, or put main in a or b.cpp at the end, I think it'd still work).
>>>
>>> With the current coed, this reports:
>>>
>>> "#variables processed by location statistics": 3,
>>> "#variables with 0% of parent scope covered by DW_AT_location": 3,
>>> ...
>>> "#local vars processed by location statistics": 3,
>>> "#local vars with 0% of parent scope covered by DW_AT_location": 3,
>>>
>>> Moving the variables into the nested scope causes these to all change to `2`, which is correct.
>>
>> Great! Thanks a lot!
>> This is the patch: https://reviews.llvm.org/D100951.
>
> Thanks!
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D96045/new/
https://reviews.llvm.org/D96045
More information about the llvm-commits
mailing list