[llvm-dev] Making library calls for obj2yaml functionalities
James Henderson via llvm-dev
llvm-dev at lists.llvm.org
Mon Sep 21 00:16:43 PDT 2020
On Fri, 18 Sep 2020 at 19:45, Rahman Lavaee <rahmanl at google.com> wrote:
>
>
> On Fri, Sep 18, 2020 at 12:12 AM James Henderson <
> jh7370.2008 at my.bristol.ac.uk> wrote:
>
>>
>> On Fri, 18 Sep 2020 at 07:51, Rahman Lavaee <rahmanl at google.com> wrote:
>>
>>> James,
>>> Thanks for the detailed response. Please see my thoughts inline.
>>>
>>> On Thu, Sep 17, 2020 at 12:33 AM James Henderson <
>>> jh7370.2008 at my.bristol.ac.uk> wrote:
>>>
>>>> Hi Rahman,
>>>>
>>>> Traditionally, the ability to read sections is a feature added to
>>>> llvm-readobj/llvm-readelf. For some sections, it delegates to methods in
>>>> places like the Object library and BinaryFormat, but for the more
>>>> specialised sections, it typically has code local to itself doing the work.
>>>> The same is true for other dumping tools like obj2yaml and llvm-objdump,
>>>> which means in some cases, we have multiple varieties of parsers for the
>>>> same thing. I'm not sure there's necessarily a strong motivation for doing
>>>> so, however, so I'd be happy to support functionality being added elsewhere
>>>> in one of those libraries, which tools like obj2yaml and llvm-readobj can
>>>> hook into. I'm also happy to support refactoring that improves code reuse
>>>> within the tools, though I don't have any further ideas on this.
>>>>
>>>> Can I ask what your motivation for using obj2yaml is in this context?
>>>> If it's just for testing purposes, adding support to
>>>> llvm-readobj/llvm-readelf would be the more normal way, as it allows you to
>>>> dump just that section.
>>>>
>>>
>>> Other than testing, we currently have code in an external tool called
>>> create_llvm_prof for parsing the ".bb_addr_map" section (+Han Shen
>>> <shenhan at google.com> who's the main developer of that tool) and loading
>>> it in memory. It would've been great if we could just link with an LLVM
>>> library which includes the data-structure and parsing support. It looks
>>> like llvm-readelf is more structured around dumping/printing. However, I
>>> can see structures like ELFYAML::StackSizesSection in
>>> lib/ObjectYAML/ELFYAML.h which could be simply passed around.
>>>
>>
>> The structures within the ObjectYAML library are really for
>> serializing/deserializing the various section types. They're not really
>> designed with any other thought in mind. Whilst they could work, the usage
>> of things like Optionals to represent a field that may or may not be
>> present in the YAML, with a default value if it isn't, and the various data
>> types within the yaml namespace (e.g. Hex64 etc for distinguishing between
>> numbers that should be represented in decimal or hex) makes these
>> unsuitable for more general purposes. It's quite likely that a
>> representation could be shared between ObjectYAML and other libraries, but
>> I don't think it would belong in that code, and it would need to be adapted
>> for the serialization process realistically, to maintain the expressiveness
>> of the existing structs.
>>
>
> I see your point that these structures are mainly intended for
> representation purposes. Deserialization is all we need. Optional field is
> OK, but we don't really need the default value. I guess the default value,
> Hex64, etc are introduced for representation and your concern is that we
> might end up mixing representation with semantics.
>
>>
>>>
>>>> Only tangentially relatedly, I've only just seen your previous
>>>> patch/email thread, and I do have one thing I'd like to ask if it can be
>>>> changed. At the moment, the section type is SHT_PROGBITS, but I think it
>>>> would be better, if possible, to define a new SHT_* type for the new
>>>> section? In general, it is bad design to rely on section names to
>>>> distinguish between different kinds of sections - this requires the linker
>>>> and other tools to have to do unnecessary string comparisons, which are
>>>> slower and messier than switching on the sh_type field.
>>>>
>>>
>>> Yes, having a specific ELF section type would be great. Though I am
>>> clueless about how this change will be received given that the
>>> ".stack_sizes" section also uses SHT_PROGBITS. I can definitely look into
>>> it if I get some assurance.
>>>
>>
>> I don't think there would be any pushback at all - if you take a look at
>> the ELF.h in BinaryFormat, you'll see several other existing LLVM-specific
>> section type values. If the section type is something that would be more
>> generically useful, it might actually be best to propose it elsewhere e.g.
>> on the GNU mailing list, or even the ELF gABI list, if it is completely
>> general, in which case a different name and possibly value for the type
>> would be appropriate. I think I'm probably not the only one to consider
>> .stack_sizes being specified by name only to be a mis-step. We might change
>> it in the future, but then we'd probably need to keep support for both
>> variations for a while after that for compatibility reasons.
>>
>
> I can start working on that change right away. I think this might even be
> useful for deserialization (taking SHT_LLVM_CALL_GRAPH_PROFILE as example).
> 1- Add a section type like SHT_LLVM_BB_ADDR_MAP.
> 2- Add appropriate structures in Object/ELF.h to deserialize that section.
> 3- Add a custom function like getBBAddrMapContent to parse that section
> (getSectionContentsAsArray won't work because the parsing is more involved).
>
> We can then call this code for both testing and deserialization in
> external tools. What do you think?
>
That sounds about right to me. You'll want to make sure you've got
something that can test the values and parsing early on - gtest unit tests
might be sufficient. You are probably also best off, if it's not too
involved, adding yaml2obj support early. That way it's easier to create
test inputs for e.g. llvm-readobj and obj2yaml.
>
> James
>>>>
>>>> On Wed, 16 Sep 2020 at 22:00, Rahman Lavaee <rahmanl at google.com> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> Following up on
>>>>> https://lists.llvm.org/pipermail/llvm-dev/2020-July/143512.html, and
>>>>> https://reviews.llvm.org/D85408, we would like to consider a design
>>>>> which allows external tools to read the structured contents of the
>>>>> .bb_addr_map section with library calls into an LLVM library. At the same
>>>>> time, we need to have tools/obj2yaml tests in place for bb_addr_map. So it
>>>>> sounds like the perfect place to do it. However, the current structure does
>>>>> not expose the obj2yaml functionalities under lib/ObjectYAML.
>>>>>
>>>>> In general, there seems to be an inconsistency regarding how
>>>>> obj2yaml and yaml2obj tools are structured. The latter has a nice wrapper
>>>>> which calls environment-dependent functions under lib/ObjectYAML, but the
>>>>> former has environment-dependent source files under
>>>>> tools/obj2yaml (elf2yaml.cc, coff2yaml.cc, etc).
>>>>>
>>>>> I wanted to reach out to ask if there is any alternative or get an
>>>>> idea about the amount of refactoring work that is required to make the
>>>>> structure friendlier.
>>>>>
>>>>> bests,
>>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200921/01ee1022/attachment.html>
More information about the llvm-dev
mailing list