[llvm-dev] RFC: [DebugInfo] Improving Debug Information in LLVM to Recover Optimized-out Function Parameters
Quentin Colombet via llvm-dev
llvm-dev at lists.llvm.org
Mon Mar 18 10:49:11 PDT 2019
Hi Nikola,
This is great, the caller is even simpler to address!
Instead of adding debug metadata, I would rather we make the ABI lowering “queryable”. I.e., something like here is my prototype where are the arguments mapped?
Like Adrian said this kind of API would be beneficial for other tools as well.
Cheers,
Quentin
> Le 18 mars 2019 à 07:46, Nikola Prica <nikola.prica at rt-rk.com> a écrit :
>
> Hi,
>
> My comments are inlined. Please find them bellow.
>
>> On 6.3.19. 02:20, Quentin Colombet wrote:
>> Hi,
>>
>> TL;DR I realize my comments are not super helpful and in a nutshell I
>> think we would better define good API for describing how function
>> arguments are lowered than adding new dbg instructions for that so that
>> other tools can benefit from it. Now, I am so far from debug information
>> generation, that I wouldn’t be upset if you choose to just ignore me :).
>
>>> On Feb 25, 2019, at 3:51 PM, Adrian Prantl via llvm-dev
>>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>>
>>>
>>>
>>>> On Feb 22, 2019, at 2:49 AM, Nikola Prica <nikola.prica at rt-rk.com
>>>> <mailto:nikola.prica at rt-rk.com>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> We have done some investigation. Please find my comment inlined bellow.
>>>>
>>>>>
>>>>>> On 14.02.2019. 20:20, Quentin Colombet wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> As much as possible I would rather we avoid any kind of metadata in MIR
>>>>>> to express the semantic of instructions.
>>>>>> Instead I would prefer that each back provides a way to interpret what
>>>>>> an instruction is doing. What I have in mind is something that would
>>>>>> generalize what we do in the peephole optimizer for instance (look for
>>>>>> isRegSequenceLike/getRegSequenceInputs and co.) or what we have for
>>>>>> analyzing branches.
>>>>>> One way we could do that and that was discussed in the past would be to
>>>>>> describe each instruction in terms of the generic mir operations.
>>>>>>
>>>>>> Ultimately we could get a lot of this semantic information
>>>>>> automatically
>>>>>> populated by TableGen using the ISel patterns, like dagger does
>>>>>> (https://github.com/repzret/dagger).
>>>>>>> Anyway, for the most part, I believe we could implement the
>>>>>> “interpreter” for just a handful of instruction and get 90% of the
>>>>>> information right.
>>>>>>
>>>>>
>>>
>>> [...]
>>>
>>>>>>> Here's a proposal for how we could proceed:
>>>>>>> 1. Decide whether to add (a) DBG_CALLSITEPARAM vs. (b) augment MIR to
>>>>>>> recognize LEA semantics and implement an analysis
>>>>>>> 2. Land above MIR support for call site parameters
>>>>>>> 3. if (a), land support for introducing DBG_CALLSITEPARAM either in
>>>>>>> calling convention lowering or post-ISEL
>>>>>>> 4. if that isn't good enough discuss whether IR call site parameters
>>>>>>> are the best solution
>>>>>>>
>>>>>>> let me know if that makes sense.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> adrian
>>>>>>
>>>>
>>>>
>>>> In order to use calling convention lowering at MIR pass level, for
>>>> recognizing instructions that forward function arguments, we would need
>>>> to implement calling convention interpreter. Only recognizing
>>>> instructions and trying to see whether it is part of calling sequence,
>>>> would not be enough. For example, we will not be able to properly handle
>>>> cases when one 64bit argument is split on two 32bit registers. This
>>>> could be handled if we know number and sizes of arguments of called
>>>> function, but then we would end up calling similar process as one from
>>>> ISEL phase. We can only know number and sizes of arguments for direct
>>>> calls since we can find IR function declaration for it and extract such
>>>> information. For indirect calls, we would not be able to perform such
>>>> analysis since we cannot fetch function’s declaration. This means that
>>>> we will not be able to support indirect calls (not without some
>>>> trickery).
>>
>> Technically, you can guess what is lay down as function parameters by
>> looking at what register are live-ins of your functions, which stack
>> location and so on. That wouldn’t help you with the number and size of
>> the arguments indeed but you know that at compile time, so I don’t know
>> why we would need to explicit those.
>>
>> Anyway, what I am saying is to me the DBG_CALLSITEPARAM is redundant
>> with what the backend already knows about the function call. The way I
>> see it is this pseudo is a kind of cached information that can be
>> otherwise computed and what worries me is what happens when we do
>> changes that break this “cache”.
>>
>
> It looks like you are looking at this from callee side, but this should
> be observed from caller side. DBG_CALLSITEPARAM is related to call
> instruction and it references virual/phyical register, constant or stack
> object that is forwarded as argument. It is handled similarly as
> DBG_VALUE machine operand that tracks value location.
>
>>>>
>>>> If everybody agrees with stated, this might be the technical reason to
>>>> give up with MIR pass that would collect call site parameter debug info.
>>>> If we are wrong with our analysis, please advise us. Otherwise, we can
>>>> go with approach with introducing DBG_CALLSITEPARAM and producing it
>>>> from ISEL phase (with dispatched IR part).
>>>>
>>>> Thanks,
>>>> Nikola
>>>
>>> If we want to avoid adding new MIR metadata as Quentin suggests, it
>>> sounds like we have really two problems to solve here:
>>>
>>> 1. At the call site determine which registers / stack slots contain
>>> (source-level) function arguments. The really interesting case to
>>> handle here is that of a small struct whose elements are passed in
>>> registers, or a struct return value.
>>
>> I am really ignorant of how LLVM’s debug information works, but I would
>> have expected we could generate this information directly when we lower
>> the ABI, then refine, in particular until we executed the prologue.
>> My DWARF is rusty but I would expect we can describe the location of the
>> arguments as registers and CFA at the function entry (fct_symbol+0).
>> Since this information must be correct at the ABI boundaries, what’s
>> left to describe is what happen next and that I don’t see how we can get
>> away without an interpreter of MIR at this point.
>>
>> E.g., let say we have:
>> void foo(int a)
>>
>> At foo+0: a is in say r0
>> foo+4: r3 = copy r0
>> ...
>> foo+0x30 store r3, fp
>>
>> In foo, maybe r0 will be optimized out, but at foo+0, a has to be here.
>> Then you would describe a’s location as being available in r3 from [4,
>> 0x30], then stored at some CFA offsets from (0x30,onward).
>>
>> I feel the DBG_CALLSITEPARAM stuff only captures the foo+0 location and
>> essentially I don’t see why we need to have it around in MIR. Now, I
>> agree that a pass interpreting how a value is moved around would need to
>> query where the information is at the being of the function but that
>> doesn’t need to be materialized in MIR.
>>
>
> You are right that DBG_CALLSITEPARAM captures only foo+0 and that is its
> purpose. DBG_CALLSITEAPRAM is generated at the place where foo is being
> called. Since foo might be called from multiple places in caller
> function its CFA is used to identify dbg call site info in caller. That
> dbg call site info contains pairs of forwarding register and
> non-clobberable value that is loaded into that register for each callee
> argument.
>
>>>
>>> The information about the callee's function signature is only
>>> available at the IR level. If we can match up a call site in MIR with
>>> the call site in IR (not sure if that is generally possible) we could
>>> introduce new API that returns the calling convention's location for
>>> each source-level function argument. Having such an API would be good
>>> to have; LLDB for example would really like to know this, too.
>>> That said, I would not want to loose the ability to model indirect
>>> function calls. In Swift, for example, indirect function calls are
>>> extremely common, as are virtual methods in C++.
>>>
>>> 2. Backwards-analyze a safe location (caller-saved register, stack
>>> slot, constant) for those function arguments.
>>>
>>> This is where additional semantic information would be necessary.
>>>
>>>
>>> Quentin, do you see a way of making (1) work without having the
>>> instruction selector lower function argument information into MIR as
>>> extra debug info metadata?
>>
>> In theory, yes, I believe we could directly generate it while lowering
>> the ABI. That said, this information is not super useful.
>> Now, IIRC, the debug info generation all happen at the end, so in that
>> respect we need a way to convey the information down to that pass and
>> there is probably not a way around some information attached somewhere.
>> Ideally, we don’t have to store this information anywhere, but instead,
>> like you said, we could have proper API that tells you what goes where
>> and you would have your "foo+0” location without carrying extra
>> information around.
>>
>
> Lowering of call instruction and producing its call sequence according
> to its ABI is performed in ISEL phase. So we introduced DBG_CALLSITE
> info as a way of conveying the information down to backend. New API,
> that we talk about could be built upon DBG_CALLSITE instructions.
>
> Best regards,
> Nikola
>
>>> I'm asking because if we have to add extra debug info metadata to
>>> deliver (1) anyway then we might as well use it instead of
>>> implementing an analysis for (2).
>>>
>>> what do you think?
>>> -- adrian
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
More information about the llvm-dev
mailing list