[llvm-dev] Query/Suggestions on upgrading macro infrastructure.

Sourabh Singh Tomar via llvm-dev llvm-dev at lists.llvm.org
Fri Jan 3 06:47:13 PST 2020


Hi David,

Thanks for sharing your thoughts/suggestions on this.

4 forms that share the same encoding in DWARFv4 - you mean the define,
> define_strp, define_strx, and define_sup?

These are:
 DW_MACINFO_define      = DW_MACRO_define
 DW_MACINFO_undef       = DW_MACRO_undef
 DW_MACINFO_start_file  = DW_MACRO_start_file
 DW_MACINFO_end_file   = DW_MACRO_end_file


I think that the solution here is probably not to change the IR
> representation - IR doesn't have the semantics that necessitate
> strip/strx/sup encodings (there's no indexed string section at the IR level
> - string sharing is a lower level implementation detail at the IR Level
> that's not apparent at the IR Level). These are choices the backend would
> make about how to efficiently encode the strings into the final output.

Agreed. IR doesn't have the semantics that necessitate strp/strx/sup
encodings, and don't have too. However considering the most productive
macro-string representation form /DW_FORM_define_strx/ and taking the
existing infrastructure, emission psuedo code will be :

 if (Version == 4 && getMacinfoType() ==  DW_MACINFO_define)
> emit DW_MACINFO_define
>    if (Version == 5 && getMacinfoType() ==  DW_MACINFO_define)
> emit DW_MACRO_define_strx

 Now here the presence of "DW_MACINFO_define" here seems unfair/odd, may
confuse the user/developer.

With new types, we can have much cleaner code:

> if (getGenericMacroType() == MACRO_definition)  /* IR types doesn't
> show/share any false dependence on how back-end DWARF or other will consume
> this*/
> if (Version == 5) emit "DW_MACRO_define_strx"
> else emit "DW_MACRO_define"


Even at the IR level, user will have confusion. When in .ll file we have
"DW_MACINFO_define" and subsequent binary contains "DW_MACRO_define_strx".
This again, is due to the fact that "DW_MACINFO_define" is defined in the
DWARF spec, resulting in a false dependence/representation. At least
visible to user familiar with DWARF.

Yet another motivating example: Exactly same thing happens with
"DW_MACINFO_start_file". We'll have "DW_MACINFO_start_file" in .ll file and
subsequent binary will contain "DW_MACRO_start_file". Consider this snippet
from llvm:

>     auto *MF = DIMacroFile::get(VMContext,
> dwarf::DW_MACINFO_start_file, TMF->getLine(), TMF->getFile(),
> getOrCreateMacroArray(I.second.getArrayRef()));
>      replaceTemporary(llvm::TempDIMacroNode(TMF), MF);


Now based on this, emission psuedo code will be:

>    if (Version == 4 && getMacinfoType() ==  DW_MACINFO_start_file)
> emit "DW_MACINFO_start_file"
>    if (Version == 5 && getMacinfoType() ==  DW_MACINFO_start_file)
> emit "DW_MACRO_start_file"


All this is to say that I agree with you about the semantics of your
> proposal, but I think the /syntax/ of the IR can remain the same, just with
> the semantics you're describing. (ie: that DW_MACINFO_define in the IR
> doesn't necessarily mean you'll get a DW_MACINFO_define in the output (you
> might get a DW_MACINFO_define, maybe a DW_MACINFO_define_strx, etc, etc... )
>


To summarize: Adding new Generic macro type to metadata IR, and removing
old ones seems a clean and promising solution. This also enforces/depicts
that IR metadata is not dependent on the target consumer DWARF or other.

Thanks,
Sourabh.


On Fri, Jan 3, 2020 at 4:01 AM David Blaikie <dblaikie at gmail.com> wrote:

>
>
> On Thu, Jan 2, 2020 at 6:03 AM Sourabh Singh Tomar <sourav0311 at gmail.com>
> wrote:
>
>> Hello Everyone,
>>
>>
>>
>> I would like to have your thoughts on this.
>>
>>
>>
>> Overview of the problem
>>
>> ===================
>>
>> While implementing support for the DWARFv5 debug_macro section support in
>> LLVM. I came across these holes:
>>
>>    -  The macros infrastructure in CLANG/LLVM is inherently tied to a
>>    particular version(v4 macinfo). For instance, consider this snippet from
>>    CLANG:
>>
>> Gen->getCGDebugInfo()->CreateMacro(getCurrentScope(), llvm::dwarf::DW_MACINFO_define,
>>> location, Name.str(), Value.str()).
>>
>>
>>    - So, let's say you want to add support for a new form
>>       /DW_MACRO_define_strx/, with existing infrastructure, you can't do this
>>       without polluting code with lot of if-else. This is just to highlight the
>>       problem, there are numerous areas where we will have to add if-else.
>>       -  DWARF  forms[DW_MACINFO*] are unnecessarily hard-coded in
>>    CLANG, resulting in a false dependence of CLANG/LLVM on entire generation
>>    phase of debug info for macros, even when they don't have to.
>>    -  To worsen the problem even more, we have some DWARF forms sharing
>>    same encoding. Currently, we have 4-forms that share same encoding with
>>    previous DWARFv4.
>>
>>
> 4 forms that share the same encoding in DWARFv4 - you mean the define,
> define_strp, define_strx, and define_sup?
>
> I think that the solution here is probably not to change the IR
> representation - IR doesn't have the semantics that necessitate
> strip/strx/sup encodings (there's no indexed string section at the IR level
> - string sharing is a lower level implementation detail at the IR Level
> that's not apparent at the IR Level). These are choices the backend would
> make about how to efficiently encode the strings into the final output.
>
> All this is to say that I agree with you about the semantics of your
> proposal, but I think the /syntax/ of the IR can remain the same, just with
> the semantics you're describing. (ie: that DW_MACINFO_define in the IR
> doesn't necessarily mean you'll get a DW_MACINFO_define in the output (you
> might get a DW_MACINFO_define, maybe a DW_MACINFO_define_strx, etc, etc... )
>
>
>>
>>
>>  High Level Summary of Solution
>>
>> ========================
>>
>> To deal with this and to facilitate addition of new future DWARF forms
>> and other stuff, I propose following changes to be done:
>>
>> ยท  Segregate CLANG entirely from macro debug info part. This has been
>> done by introducing a new Generic Macro Types. Since macro information can
>> be  broadly classified into 4 categories:
>>
>> 1.     Definition
>>
>> 2.     Un-definition
>>
>> 3.     File inclusion
>>
>> 4.     File termination
>>
>> So I propose to introduce 4 new Generic Macro Types  to be used as types,
>> with no dependence to various DWARFv5/v4 forms.
>>
>>
>>    1. MACRO_definition
>>    2. MACRO_undef
>>    3. MACRO_start_file
>>    4. MACRO_end_file
>>
>>
>> - Previous infrastructure uses DWARF forms as types in metadata --
>>
>>> !DIMacro(type: DW_MACINFO_define, line: 9, name: "Name", value: "Value")
>>
>> And the production part will query this and emits the form
>> present[hardcoded].
>>
>>
>>
>> Upgraded infrastructure will be using something like --
>>
>>> !DIMacro(type: MACRO_definition, line: 9, name: "Name", value: "Value")
>>
>> Now production part will query the Generic Macro Type present in metadata
>> and based on the version of DWARF specified, it will emit forms accordingly.
>>
>>
>>
>> - Now this should cater to all our needs of LLVM and debug-info
>> production-consumption WRT to macro.
>>
>>
>>
>> This approach solves the problem highlighted above and simplifies the
>> code, increases code/infrastructure reuse for building DWARFv5 macro and
>> future additons.
>>
>>
>> I have a patch[implementing this] ready, currently supporting DWARFv4
>> macro only. Planning to build rest of DWARFv5 on top of this. Could you
>> guys let me know your thoughts and how to proceed forward with this. Should
>> we need an RFC for it have a better discussion?
>>
>>
>>
>> Thanks in anticipation,
>>
>> Sourabh.
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200103/d6a7c3c0/attachment.html>


More information about the llvm-dev mailing list