[PATCH] D58848: [DebugInfo] follow up for "add SectionedAddress to DebugInfo interfaces"

Mon Mar 11 17:48:44 PDT 2019

dblaikie added inline comments.

================
Comment at: llvm/lib/DebugInfo/Symbolize/Symbolize.cpp:146
+  std::pair<std::string, std::string> ParsedModuleName =
+      parseModuleName(ModuleName);
+
----------------
avl wrote:
> dblaikie wrote:
> > avl wrote:
> > > dblaikie wrote:
> > > > avl wrote:
> > > > > avl wrote:
> > > > > > dblaikie wrote:
> > > > > > > avl wrote:
> > > > > > > > dblaikie wrote:
> > > > > > > > > dblaikie wrote:
> > > > > > > > > > I hadn't spotted this code (or looked very closely at it, at least) in the original review.
> > > > > > > > > > 
> > > > > > > > > > I'm surprised/confused why the original change involved adding code that would look at input object file names at all. Why is this?
> > > > > > > > > > 
> > > > > > > > > > Hmm, I see looking at the llvm-symbolizer code that this logic looks up the section name and then passes that in along with the file name to the symbolizing routine.
> > > > > > > > > > 
> > > > > > > > > > But what's the point of that? It doesn't seem like it would change/improve the quality of the symbolizer's output compared to passing in a "I don't know what section" value, right?
> > > > > > > > > I think this follows on from one of the comments I made in the original review - I don't know it's worth plumbing through the section index to any callers except the original one you had in mind (objdump disassembly) as all the other callers will need work to do anytihng other than what they do today (eg: the symbolizer could have support for multiple answers - the same address in multiple sections (in which case it'd need tests for that, etc)- but this bit of code isn't doing that)
> > > > > > > > I think that my original description was not clear enough and there is some misunderstanding of new interface usage. Let`s clarify it now and decide what should it mean. 
> > > > > > > > 
> > > > > > > > I assume that SectionIndex should always be set in the correct value. Setting SectionIndex into Undef state could lead to incorrect work if relocations present. At least that is how it is currently done. That assumes that all callers of new interface need to do extra work to set SectionIndex properly. In some cases I already done that additional work, in some cases left it in Undef state(if it did not brake testing) and for some cases getModuleSectionIndexForAddress could be used for setting SectionIndex(that is the temporarily routine until proper code implemented). These two last cases assumed to be changed into proper implementation.
> > > > > > > > 
> > > > > > > > It differs from the logic - if SectionIndex presented then take it into account, if SectionIndex is not presented then handle it somehow. 
> > > > > > > > 
> > > > > > > > The rationale for that is that DWARFDebugLineTable looks to be improper place to decide how to handle "I don't know what section" case. Different use cases could assume different handling of that situation. So I decided that DWARFDebugLineTable should always receive full info and caller of this routine should decide how to handle this situation.
> > > > > > > > 
> > > > > > > > f.e. case "I don't know what section" could be handled : 
> > > > > > > > 
> > > > > > > > a) find correct section and set it.
> > > > > > > > b) find several sections which match to local offset and use them.
> > > > > > > > c) find last section which match to local offset and use it(behavior before fix).
> > > > > > > > d) ignore that case. 
> > > > > > > > f) display error ... something else...
> > > > > > > > 
> > > > > > > > Do you think we need to change it and  allow setting SectionIndex into Undef state and specify how  DWARFDebugLineTable would handle it ?
> > > > > > > > 
> > > > > > > > 
> > > > > > > Ah.
> > > > > > > 
> > > > > > > But most (all?) executables only have one executable section, right?  That's why the way this worked before your change has worked OK for most/all users historically?
> > > > > > > 
> > > > > > > On that basis I think it makes sense to let the exsiting/previous APIs (witohut specifying a section index (ie: have the original functions, without any section index parameter - then overloaded with the new versions that allow the section index to be specified) - or perhaps having to specify an Undef one) to continue to work, while letting those callers who need to, specify it.
> > > > > > right, most executables only have one executable section. That is why previous implementation worked correctly for them using only address as point locator. But for not-linked objects this is not right. Previous implementation did not work for them. i.e. if we allow previous implementation by setting SectionIndex = UndefSection then such a call would work for only linked executable. That means that DWARFDebugLine or LLVMSymbolizer would imply concrete state.  
> > > > > > 
> > > > > > following calls could be done for only linked executables : 
> > > > > > 
> > > > > > DWARFDebugLine.getFileLineInfoForAddress ( SectionedAddress.SectionIndex = UndefSection
> > > > > > LLVMSymbolizer.symbolizeInlinedCode ( SectionedAddress.SectionIndex = UndefSection
> > > > > > 
> > > > > > it would be error to call them  for non-linked object file. That means that all places where we could expect non-linked object file needs to have something like this : 
> > > > > > 
> > > > > > if (linked executable) {
> > > > > >      LLVMSymbolizer.symbolizeInlinedCode ( SectionedAddress.SectionIndex = UndefSection
> > > > > > } else {
> > > > > >      LLVMSymbolizer.symbolizeInlinedCode ( SectionedAddress.SectionIndex = getSectionIndex()
> > > > > > 
> > > > > > }
> > > > > > 
> > > > > > llvm-dwardump, llvm-objectdump, lld, llvm-symbolizer - could work not only with linked executable but with not-linked object files also. So it looks like most usages would expect both cases - linked executables and not-linked object files. Probably in that case stateless interface would be more convenient ? 
> > > > > > 
> > > > > > Current stateless interface does not assume whether it works for linked or not-linked object. It always receives correct SectionIndex and works equally for both types : 
> > > > > > 
> > > > > > LLVMSymbolizer.symbolizeInlinedCode ( SectionedAddress.SectionIndex = getSectionIndex()
> > > > > > 
> > > > > > The places where it becomes less convenient  is the code assumed to work with *only* linked-executable. In such places there would be neccessary to add code which would detect SectionIndex for new interface. But this looks to be quite natural task and written once it could be used later for free. I created such a routine - getModuleSectionIndexForAddress(though I agree this is quite naive implementation) .
> > > > > > 
> > > > > > The advantage of the new stateless interface is that it could not be used incorrectly. i.e. it would not be possible to use UndefSection for non-linked object files. 
> > > > > > 
> > > > > > Probably it would make sense to not keep old behavior without SectionIndex ?  
> > > > > > 
> > > > > > currently UndefSection used in following modules : 
> > > > > > 
> > > > > > 1. sanstats
> > > > > > 2. sancov
> > > > > > 3. llvm-xray
> > > > > > 4. llvm-dwarfdump
> > > > > > 
> > > > > > 
> > > > > David, do you think it makes sense to assume that SectionIndex should always be set in the correct value(not equal to undef) ? Or your preferred choice is to allow SectionIndex to be set into UndefState and assume that in that case  linked executables should be processed ?
> > > > I don't think it makes sense to open the file separately, search for a valid section index, then pass that into the API - compared to the API doing this for you. If you know the section you want, you pass it in, if not, you don't & you get the best effort (first instance, or perhaps APIs could be updated to return multiple responses when it's ambiguous).
> > > > 
> > > > I take it your idea was that this "open the file, search for some section index which covers the address, then use that section index when calling the underlying API" was meant as a stop-gap until some longer term solution where section indexes would always be passed in?
> > > > 
> > > > I don't think that's a future we'd really have - most of these uses/users aren't going to pass in section indexes (users of llvm-dwarfdump/objdump/symbolizeretc are unlikely to specify a section index on the command line too), so I think it makes sense to have the built-in functionality be what's currently grafted on top using getModuleSectionIndexForAddress - leave that as the built-in functionality as it was before and let users override/be specific if they want to be.
> > > >I take it your idea was that this "open the file, search for some section index which 
> > > >covers the address, then use that section index when calling the underlying API" was 
> > > >meant as a stop-gap until some longer term solution where section indexes would 
> > > >always be passed in?
> > > 
> > > yes. that`s right.
> > > 
> > > >I don't think that's a future we'd really have - most of these uses/users aren't 
> > > >going to pass in section indexes (users of llvm-dwarfdump/objdump
> > > >/symbolizeretc are unlikely to specify a section index on the command line too), so 
> > > >I think it makes sense to have the built-in functionality be what's currently grafted 
> > > >on top using getModuleSectionIndexForAddress - leave that as the built-in 
> > > >functionality as it was before and let users override/be specific if they want to be.
> > > 
> > > That is also thing which I am agree. i.e. users should have a  built-in functionality for setting proper SectionIndex. They should not do it manually. 
> > > 
> > > Thing which seems to me questionable is that - Is it OK to hide that builtin functionality under old symbolizer interface : 
> > > 
> > > symbolizeCode(..., object::SectionedAddress ModuleOffset)
> > > symbolizeInlinedCode(..., object::SectionedAddress ModuleOffset)
> > > symbolizeData(..., object::SectionedAddress ModuleOffset
> > > 
> > > i.e. if ModuleOffset.SectionIndex = UndefSection and if we will keep old behavior in this case. That looks questionable because different use cases could require different behaviour. But in this case some concrete behavior would already be done.
> > > 
> > > Instead we can specify interface so that it always receives correct unambitious information. i.e. these methods would always receive correct SectionIndex : 
> > > 
> > > symbolizeCode(..., object::SectionedAddress ModuleOffset)
> > > symbolizeInlinedCode(..., object::SectionedAddress ModuleOffset)
> > > symbolizeData(..., object::SectionedAddress ModuleOffset
> > > 
> > > so that SectionIndex either set into proper index of section, either set into Undef state and it means an absolute address.
> > > 
> > > That allows to implement several builtin behaviors separately. for example : we could implement this method for symbolizer : 
> > > 
> > > enumerateSections (uint64_t address, std::function<void(uint_64_t SectionIndex)> on_section); 
> > > 
> > > then llvm-dwarfdump/objdump/symbolizer would read an address from command line ask symbolizer for sections which specified address relates to and call symbolizeCode for the received info. Then this enumerateSections method would be builtin solution for them. 
> > > 
> > > If we will need another behavior(find first of, find last of/display error) then it would be possible to create another helper builtin routine(as part of symbolizer interface or as separate implementation). 
> > > 
> > > This patch follows that idea : it requires SectionIndex to be properly specified for symbolizeCode/InlinedCode/Data methods and provides helper getModuleSectionIndexForAddress() routine so that all existing tools would be able to call it and work the same way as before.  This getModuleSectionIndexForAddress routine assumed to be temporarily and should be replaced later with proper solution(s). Something like above enumerateSections() suggestion. 
> > > 
> > > So it looks like this solution allows API to help users and at the same time to implement several use cases. 
> > > 
> > > 
> > > 
> > I'm still not sure I follow the harm of having these APIs default to their old behavior - if the current solution is to have them all use this extra API to more awkwardly get the old behavior anyway.
> > 
> > "then llvm-dwarfdump/objdump/symbolizer would read an address from command line ask symbolizer for sections which specified address relates to and call symbolizeCode for the received info. Then this enumerateSections method would be builtin solution for them."
> > 
> > What's the new behavior you're describing here - how would dwarfdump/objdump/symbolizer behave differently if they enumerated the sections? What sort of user-facing behavior change are you picturing here?
> > 
> > I think the existing behavior's really not too problematic & fine to preserve (what do GNU tools do - eg: addr2line on an object file?) & API users can override the Undef section index to some specific section if/when they want to.
> >I'm still not sure I follow the harm of having these APIs default to their old behavior - 
> >if the current solution is to have them all use this extra API to more awkwardly get 
> >the old behavior anyway.
> 
> My main concern about "old behavior for Undef section index" is that it is implicit solution and this solution does not have cancel option. If we will need to implement another behavior for :
> 
> symbolizeCode(SectionedAddress.SectionIndex = Undef)
> 
> How would it be possible to cancel "old behavior for Undef section index" ?
> 
> >"then llvm-dwarfdump/objdump/symbolizer would read an address from command 
> >line ask symbolizer for sections which specified address relates to and call 
> >symbolizeCode for the received info. Then this enumerateSections method would 
> >be builtin solution for them."
> 
> >What's the new behavior you're describing here - how would dwarfdump/objdump
> >/symbolizer behave differently if they enumerated the sections? What sort of 
> >user-facing behavior change are you picturing here?
> 
> Actually I do not suggest to change their behavior here. 
> I just show that they could require different behavior than current one: 
> 
> llvm-symbolizer -obj=test.o 0x10  
> func1()
> test.c:3
> 
> above usage will show only one entry from some.o. If some.o has more sections 
> which match to the specified address then they would not be shown. It is possible that symbolizer could be extended to show entries from all sections matched to specified address.  
> 
> llvm-symbolizer -obj=test.o 0x10  
> func1()
> test.c:3
> func2()
> test.c:20
> 
> In this case there would be necessary to implement some sort of sections enumeration. "old behavior for Undef section index" would not help here.
> 
> >I think the existing behavior's really not too problematic & fine to preserve (what do 
> >GNU tools do - eg: addr2line on an object file?) & API users can override the Undef 
> >section index to some specific section if/when they want to.
> 
> I do not suggest to change behavior. If current behavior is OK then let`s preserve it. I suggest to not hide specific behavior under API. 
> 
> My concern is that let`s support it in an explicit way. It does not look awkwardly : 
> 
> object::SectionedAddress ModuleOffset = {
>       Offset, Symbolizer.getModuleSectionIndexForAddress(ModuleName, Offset)};
> 
> Symbolizer.symbolizeCode(ModuleName, ModuleOffset)
> 
> It explicitly tells how Section Index received. getModuleSectionIndexForAddress()
> preserve old behavior.
> 
> It could easily be extended for future needs(if necessary)
> The advantage is that it would be explicitly seen which behavior would be used:
> 
> object::SectionedAddress ModuleOffset = {
>       Offset, Symbolizer.getSectionIndexForNewBehavior(ModuleName, Offset)};
> 
> Symbolizer.symbolizeCode(ModuleName, ModuleOffset)
> 
> >API users can override the Undef 
> >section index to some specific section if/when they want to
> 
> API users can not override behavior for Undef section index if 
> Undef section index would mean something specific. If the idea would be to 
> get symbol information for only absolute addresses - then call to 
> 
> symbolizeCode(SectionedAddress.SectionIndex = Undef)
> 
> would return wrong results, since "old behavior" does extra work.
> 
> 
> My main concern about "old behavior for Undef section index" is that it is implicit solution and this solution does not have cancel option. If we will need to implement another behavior for :
> 
> symbolizeCode(SectionedAddress.SectionIndex = Undef)
> 
> How would it be possible to cancel "old behavior for Undef section index" ?

I'm not sure what you mean by "cancel the old behavior" - if a caller never passes Undef, they don't get the old behavior, right?

So it seems clear to me how the caller can choose to get the old behavior, or choose not to.

> Actually I do not suggest to change their behavior here. 
> I just show that they could require different behavior than current one:
> 
> llvm-symbolizer -obj=test.o 0x10 
> func1()
> test.c:3
> 
> above usage will show only one entry from some.o. If some.o has more sections 
> which match to the specified address then they would not be shown. It is possible > that symbolizer could be extended to show entries from all sections matched to specified address.
> 
> llvm-symbolizer -obj=test.o 0x10 
> func1()
> test.c:3
> func2()
> test.c:20

This ^ is what I would mean by a change in behavior. That's a change in what llvm-symbolizer does/how it behaves.

> In this case there would be necessary to implement some sort of sections enumeration. "old behavior for Undef section index" would not help here.

Maybe - alternatively, the API could be changed to respond with multiple results when the request is ambiguous (when there's more than one section with that address in it).

But the particular thing about this API is that it separately opens the file and searches through the sections - which seems like a fairly awkward API (to have to open the file, processes it, then call an API that will open the file again & parse it completely separately).

Keeping the previous behavior as a fallback when Undef is given seems like a better option to me (avoids the "open the same file again and do your own search for the section index before calling the API which will independently reopen/reparse the file, etc"). In the future, the API could be improved to, instead of returning the first (or some arbitrary) result for the address - instead it could be split into two APIs - one that takes a SectionedAddress and returns zero or one results, and one that takes a raw address and returns zero or more results. But I don't think there's any need to rush to implimenting that in any callers - they were mostly doing fine with the old behavior.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D58848/new/

https://reviews.llvm.org/D58848