[llvm-dev] Query/Suggestions on upgrading macro infrastructure.

David Blaikie via llvm-dev llvm-dev at lists.llvm.org
Fri Jan 3 14:45:32 PST 2020


On Fri, Jan 3, 2020 at 6:47 AM Sourabh Singh Tomar <sourav0311 at gmail.com>
wrote:

> Hi David,
>
> Thanks for sharing your thoughts/suggestions on this.
>
> 4 forms that share the same encoding in DWARFv4 - you mean the define,
>> define_strp, define_strx, and define_sup?
>
> These are:
>  DW_MACINFO_define      = DW_MACRO_define
>  DW_MACINFO_undef       = DW_MACRO_undef
>  DW_MACINFO_start_file  = DW_MACRO_start_file
>  DW_MACINFO_end_file   = DW_MACRO_end_file
>
>
> I think that the solution here is probably not to change the IR
>> representation - IR doesn't have the semantics that necessitate
>> strip/strx/sup encodings (there's no indexed string section at the IR level
>> - string sharing is a lower level implementation detail at the IR Level
>> that's not apparent at the IR Level). These are choices the backend would
>> make about how to efficiently encode the strings into the final output.
>
> Agreed. IR doesn't have the semantics that necessitate strp/strx/sup
> encodings, and don't have too. However considering the most productive
> macro-string representation form /DW_FORM_define_strx/ and taking the
> existing infrastructure, emission psuedo code will be :
>
>  if (Version == 4 && getMacinfoType() ==  DW_MACINFO_define)
>> emit DW_MACINFO_define
>>    if (Version == 5 && getMacinfoType() ==  DW_MACINFO_define)
>> emit DW_MACRO_define_strx
>
>  Now here the presence of "DW_MACINFO_define" here seems unfair/odd, may
> confuse the user/developer.
>
> With new types, we can have much cleaner code:
>
>> if (getGenericMacroType() == MACRO_definition)  /* IR types doesn't
>> show/share any false dependence on how back-end DWARF or other will consume
>> this*/
>> if (Version == 5) emit "DW_MACRO_define_strx"
>> else emit "DW_MACRO_define"
>
>
I don't think that adding a non-DWARF name can make the code cleaner than
reusing the DWARF name for both v4 and v5.

Either you have to translate from generic to v4 and generic to v5, or you
have to translate from v4 to v5, and do nothing for v4 to v4. It seems
strictly simpler (you could for symmetry always do a v4 to v4 translation
too, but I doubt that'd be necessary).

I'd probably leave the existing code mostly as-is with just a couple of
'if' statements:

    bool UseMacinfo = getDwarfVersion() >= 5;
    unsigned MacInfoType = M.getMacinfoType();
    assert(MacinfoType == DW_MACINFO_define || MacinfoType ==
DW_MACINFO_undef);
    // alternatively this could use a map lookup or other mechanism for
mapping from DWARFv4 to DWARFv5 encodings
    Asm->EmitULEB128(MacinfoType + (UseMacInfo ? (DW_MACRO_define_strx - 1)
: 0);
    Asm->EmitULEB128(M.getLine());
    StringRef Name = M.getName();
    StringRef Value = M.getValue();
    if (UseMacinfo) {
      DwarfFile &Holder = useSplitDwarf() ? SkeletonHolder : InfoHolder;
      Asm->EmitULEB128(DG.getStringPool().getEntry(*DG.getAsmPrinter(),
(Name + ' ' + Value).str()));
      return;
    }
    Asm->OutStreamer->EmitBytes(Name);
    if (!Value.empty()) {
      // There should be one space between macro name and macro value.
      Asm->emitInt8(' ');
      Asm->OutStreamer->EmitBytes(Value);
    }
    Asm->emitInt8('\0');



> Even at the IR level, user will have confusion. When in .ll file we have
> "DW_MACINFO_define" and subsequent binary contains "DW_MACRO_define_strx".
> This again, is due to the fact that "DW_MACINFO_define" is defined in the
> DWARF spec, resulting in a false dependence/representation. At least
> visible to user familiar with DWARF.
>
> Yet another motivating example: Exactly same thing happens with
> "DW_MACINFO_start_file". We'll have "DW_MACINFO_start_file" in .ll file and
> subsequent binary will contain "DW_MACRO_start_file". Consider this snippet
> from llvm:
>
>>     auto *MF = DIMacroFile::get(VMContext,
>> dwarf::DW_MACINFO_start_file, TMF->getLine(), TMF->getFile(),
>> getOrCreateMacroArray(I.second.getArrayRef()));
>>      replaceTemporary(llvm::TempDIMacroNode(TMF), MF);
>
>
> Now based on this, emission psuedo code will be:
>
>>    if (Version == 4 && getMacinfoType() ==  DW_MACINFO_start_file)
>> emit "DW_MACINFO_start_file"
>>    if (Version == 5 && getMacinfoType() ==  DW_MACINFO_start_file)
>> emit "DW_MACRO_start_file"
>
>
> All this is to say that I agree with you about the semantics of your
>> proposal, but I think the /syntax/ of the IR can remain the same, just with
>> the semantics you're describing. (ie: that DW_MACINFO_define in the IR
>> doesn't necessarily mean you'll get a DW_MACINFO_define in the output (you
>> might get a DW_MACINFO_define, maybe a DW_MACINFO_define_strx, etc, etc... )
>>
>
>
> To summarize: Adding new Generic macro type to metadata IR, and removing
> old ones seems a clean and promising solution. This also enforces/depicts
> that IR metadata is not dependent on the target consumer DWARF or other.
>
> Thanks,
> Sourabh.
>
>
> On Fri, Jan 3, 2020 at 4:01 AM David Blaikie <dblaikie at gmail.com> wrote:
>
>>
>>
>> On Thu, Jan 2, 2020 at 6:03 AM Sourabh Singh Tomar <sourav0311 at gmail.com>
>> wrote:
>>
>>> Hello Everyone,
>>>
>>>
>>>
>>> I would like to have your thoughts on this.
>>>
>>>
>>>
>>> Overview of the problem
>>>
>>> ===================
>>>
>>> While implementing support for the DWARFv5 debug_macro section support
>>> in LLVM. I came across these holes:
>>>
>>>    -  The macros infrastructure in CLANG/LLVM is inherently tied to a
>>>    particular version(v4 macinfo). For instance, consider this snippet from
>>>    CLANG:
>>>
>>> Gen->getCGDebugInfo()->CreateMacro(getCurrentScope(), llvm::dwarf::DW_MACINFO_define,
>>>> location, Name.str(), Value.str()).
>>>
>>>
>>>    - So, let's say you want to add support for a new form
>>>       /DW_MACRO_define_strx/, with existing infrastructure, you can't do this
>>>       without polluting code with lot of if-else. This is just to highlight the
>>>       problem, there are numerous areas where we will have to add if-else.
>>>       -  DWARF  forms[DW_MACINFO*] are unnecessarily hard-coded in
>>>    CLANG, resulting in a false dependence of CLANG/LLVM on entire generation
>>>    phase of debug info for macros, even when they don't have to.
>>>    -  To worsen the problem even more, we have some DWARF forms sharing
>>>    same encoding. Currently, we have 4-forms that share same encoding with
>>>    previous DWARFv4.
>>>
>>>
>> 4 forms that share the same encoding in DWARFv4 - you mean the define,
>> define_strp, define_strx, and define_sup?
>>
>> I think that the solution here is probably not to change the IR
>> representation - IR doesn't have the semantics that necessitate
>> strip/strx/sup encodings (there's no indexed string section at the IR level
>> - string sharing is a lower level implementation detail at the IR Level
>> that's not apparent at the IR Level). These are choices the backend would
>> make about how to efficiently encode the strings into the final output.
>>
>> All this is to say that I agree with you about the semantics of your
>> proposal, but I think the /syntax/ of the IR can remain the same, just with
>> the semantics you're describing. (ie: that DW_MACINFO_define in the IR
>> doesn't necessarily mean you'll get a DW_MACINFO_define in the output (you
>> might get a DW_MACINFO_define, maybe a DW_MACINFO_define_strx, etc, etc... )
>>
>>
>>>
>>>
>>>  High Level Summary of Solution
>>>
>>> ========================
>>>
>>> To deal with this and to facilitate addition of new future DWARF forms
>>> and other stuff, I propose following changes to be done:
>>>
>>> ยท  Segregate CLANG entirely from macro debug info part. This has been
>>> done by introducing a new Generic Macro Types. Since macro information can
>>> be  broadly classified into 4 categories:
>>>
>>> 1.     Definition
>>>
>>> 2.     Un-definition
>>>
>>> 3.     File inclusion
>>>
>>> 4.     File termination
>>>
>>> So I propose to introduce 4 new Generic Macro Types  to be used as
>>> types, with no dependence to various DWARFv5/v4 forms.
>>>
>>>
>>>    1. MACRO_definition
>>>    2. MACRO_undef
>>>    3. MACRO_start_file
>>>    4. MACRO_end_file
>>>
>>>
>>> - Previous infrastructure uses DWARF forms as types in metadata --
>>>
>>>> !DIMacro(type: DW_MACINFO_define, line: 9, name: "Name", value: "Value")
>>>
>>> And the production part will query this and emits the form
>>> present[hardcoded].
>>>
>>>
>>>
>>> Upgraded infrastructure will be using something like --
>>>
>>>> !DIMacro(type: MACRO_definition, line: 9, name: "Name", value: "Value")
>>>
>>> Now production part will query the Generic Macro Type present in
>>> metadata and based on the version of DWARF specified, it will emit forms
>>> accordingly.
>>>
>>>
>>>
>>> - Now this should cater to all our needs of LLVM and debug-info
>>> production-consumption WRT to macro.
>>>
>>>
>>>
>>> This approach solves the problem highlighted above and simplifies the
>>> code, increases code/infrastructure reuse for building DWARFv5 macro and
>>> future additons.
>>>
>>>
>>> I have a patch[implementing this] ready, currently supporting DWARFv4
>>> macro only. Planning to build rest of DWARFv5 on top of this. Could you
>>> guys let me know your thoughts and how to proceed forward with this. Should
>>> we need an RFC for it have a better discussion?
>>>
>>>
>>>
>>> Thanks in anticipation,
>>>
>>> Sourabh.
>>>
>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200103/3464ae51/attachment.html>


More information about the llvm-dev mailing list