[PATCH] D45878: [DEBUG INFO] Fixing cases where debug info (-g) causes changes in the program.

Mon Apr 30 07:23:28 PDT 2018

gberry added a comment.

In https://reviews.llvm.org/D45878#1081950, @jonpa wrote:

> In https://reviews.llvm.org/D45878#1079514, @gberry wrote:
>
> >
>
>
> I started to write the utility functions in MachineInstr/MachineBasicBlock, but thinking about it, I wonder if it wouldn't be even better to integrate fully the handling of DBG_VALUEs and DebugLocs into splice()? After all, the DBG_VALUEs should *always* be (re)moved together with the MachineInstr, so why expose this to a particular optimizer pass? If splice() would do this per default, the optimizer would not have to worry about it, and could not even forget about it. Is there a reason to always do this on the side?
>
> If this is guarded by a check to see if the following instruction is a DBG_VALUE, I think builds without debug info would not suffer from increased compile-time. BTW, is there a better way to check if it is a debug build or not, like maybe hasDebugInfo()?
>
> One issue with this is that the caller usually has an iterator, that now gets invalidated if a DBG_VALUE is moved automatically. I think this means splice() has to return an iterator to the next following instruction after the moved one(s). There would also have to be one method that handled reverse iterators. My idea is that splice would only do this if an iterator is passed as an optional argument, so that optimizers could gradually get used to this.
>
> Again, I am not an expert here, just assuming from quick observation that when moving an instruction it is good to
>
> 1. move any connected DBG_VALUEs with the moved MI
> 2. update the DebugLoc of MI for its new position.
>
>   I have so far just reused the code in MachineSink, but I don't know how good this is or if it could be improved. MachineSink is currently checking through all DBG_VALUEs after MI and picking only the right ones, and stopping at first non-debug MI. This makes sense to me, even though I'm not really clear on why the DBG_VALUEs could end up mixed like that. Could it perhaps be that if we started handling this properly everywhere, we could assume that they are never mixed (that DBG_VALUE that express a use of a register is always to be found directly after the defining MI)? Does this relate perhaps to instructions with multiple defs (like an address post-increment), and if so, should we perhaps scan MI for all explicit defs instead of just the one at index 0?
>
>   Of course, Pre-RA this could be done with MRI use-def information instead of the search.
>
>   Then there is the question of what to do when moving a range of instructions. Depending on how mixed up the DBG_VALUEs might get (when all optimizers are doing the right thing), one might want to either handle the whole range of instructions to find out what following DBG_VALUEs to also move, or perhaps just look at the last instruction of the range.
>
>   Similarly, eraseFromParent() could always also erase any connected DBG_VALUes. There is actually eraseFromParentAndMarkDBGValuesForRemoval, but only with virtual registers, sofar. Should the search for DBG_VALUES after MI be added here to be done post-RA?
>
>   I would like to see some clear comment on exactly what is expected from an optimization pass that moves / erases instructions in regards to DBG_VALUE instructions, but can't find one...?
>
>   Do you think this makes sense, or should we stick with separate utility functions?

You should probably bring this issue up on llvm-dev to get more feedback from the community.

https://reviews.llvm.org/D45878