[PATCH] D82545: [Debugify] Make the debugify aware of the original (-g) Debug Info

Fri Sep 25 07:14:13 PDT 2020

djtodoro added inline comments.

================
Comment at: llvm/lib/Transforms/Utils/Debugify.cpp:355
+
+      auto InstrIt = DILocsBefore.find(Instr);
+      if (InstrIt == DILocsBefore.end()) {
----------------
StephenTozer wrote:
> djtodoro wrote:
> > vsk wrote:
> > > djtodoro wrote:
> > > > vsk wrote:
> > > > > Is `Instr` an `llvm::Instruction *`? I wonder whether it is sound to assume that pointer equality for `llvm::Instruction` implies actual equality, if we're comparing pointers before/after a pass runs. It's possible for a pass to delete and then create instructions: that could result in pointer reuse/recycling.
> > > > It is an `llvm::Instruction *`. I see... any idea how to address that?
> > > > (I was trying to avoid having to make a temp object of the instr...)
> > > I dug into this a bit and unfortunately it looks like a thorny issue. It looks like Instructions are (de)allocated via `User::operator new` and `User::operator delete`, and these essentially call into the system allocator. Relying on pointer equality to identify a unique instruction before/after a pass modifies the IR might necessitate e.g. changing the `delete` implementation to never free memory. That might not be practical -- it's certainly a big hammer.
> > > 
> > > It's possible that there's some alternative approach, but I'm not sure whether they'd be effective. One idea might be to use metadata to uniquely identify instructions (not !dbg metadata, of course :), and to teach the `dropUnknownNonDebugMetadata` utility to preserve that "inst ID" metadata. (I'd be curious to get some other opinions on whether that's workable.)
> > Thanks! I have also taken a look, and I agree that the gymnastic with the `new/delete` operators won't be practical choice.
> > 
> > Using some special kind of metadata can help us for this purpose. The "inst ID" metadata sounds good to me, since we are going to use it for this purpose **only**. We can call it `!debug-instr-number`, similar to the LiveDebugValues/InstrRef (D85741) work.
> > 
> > Let me try to implement a prototype for it, so we can see what are the pros && cons.
> The metadata approach should work, although as discussed it may require changes to `dropUnknownNonDebugMetadata`. It might be a little messy though; I think it would also be viable to take the following approach: In the "before" step, populate a map of `Instruction* Instr -> WeakVH(Instr)`. With this map, you can check `Map[Instr] != nullptr` to determine whether `Instr` was deleted at any point in the pass; if it wasn't deleted, then you can simply use raw pointer comparison. This avoids the need to fiddle with the internals of `Instruction` that could potentially affect performance or the behaviour of other code.
> 
> I think it worth adding that there will be some noise with this approach; the DebugLoc metadata for any new generated instructions will be dependent on the DebugLoc metadata for the instruction they're based on. For example, cloning an instruction , replacing a conditional branch with an unconditional one, reassociating arithmetic, etc. In each of these cases, we will get a warning from the preservation pass that a new instruction did not receive a DILocation, even though the pass that generated it had no responsibility for it. It's technically correct, but it may result in a continuously increasing amount of noise from pass-to-pass, with the worst case being that there are no DebugLocs left at the start of a pass so that we get an error for every instruction created even if the pass itself is handling DebugLocs correctly.
> 
> Unfortunately I don't think there's a way around this issue; if we had a way of programmatically determining which Instruction(s) that a new instruction //should// derive its DebugLoc from, then we wouldn't have as much need for debugify to begin with.
@StephenTozer thanks for the comment.

> The metadata approach should work, although as discussed it may require changes to dropUnknownNonDebugMetadata. It might be a little messy though; I think it would also be viable to take the following approach: In the "before" step, populate a map of Instruction* Instr -> WeakVH(Instr). With this map, you can check Map[Instr] != nullptr to determine whether Instr was deleted at any point in the pass; if it wasn't deleted, then you can simply use raw pointer comparison. This avoids the need to fiddle with the internals of Instruction that could potentially affect performance or the behaviour of other code.
> 
Interesting, but I think we'd still have the problem with the pointer reusing?

I still believe that we can use the approach from D85741, by using just simple unsigned IDs (I am making a prototype patch for it).

> I think it worth adding that there will be some noise with this approach; the DebugLoc metadata for any new generated instructions will be dependent on the DebugLoc metadata for the instruction they're based on. For example, cloning an instruction , replacing a conditional branch with an unconditional one, reassociating arithmetic, etc. In each of these cases, we will get a warning from the preservation pass that a new instruction did not receive a DILocation, even though the pass that generated it had no responsibility for it. It's technically correct, but it may result in a continuously increasing amount of noise from pass-to-pass, with the worst case being that there are no DebugLocs left at the start of a pass so that we get an error for every instruction created even if the pass itself is handling DebugLocs correctly.
> 
Actually, in the verify-each-pass-mode, it will point to the first pass that "caused" the issue with the instruction.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82545/new/

https://reviews.llvm.org/D82545