[PATCH] D82545: [Debugify] Make the debugify aware of the original (-g) Debug Info

Fri Sep 25 07:36:56 PDT 2020

StephenTozer added inline comments.

================
Comment at: llvm/lib/Transforms/Utils/Debugify.cpp:355
+
+      auto InstrIt = DILocsBefore.find(Instr);
+      if (InstrIt == DILocsBefore.end()) {
----------------
djtodoro wrote:
> StephenTozer wrote:
> > djtodoro wrote:
> > > vsk wrote:
> > > > djtodoro wrote:
> > > > > vsk wrote:
> > > > > > Is `Instr` an `llvm::Instruction *`? I wonder whether it is sound to assume that pointer equality for `llvm::Instruction` implies actual equality, if we're comparing pointers before/after a pass runs. It's possible for a pass to delete and then create instructions: that could result in pointer reuse/recycling.
> > > > > It is an `llvm::Instruction *`. I see... any idea how to address that?
> > > > > (I was trying to avoid having to make a temp object of the instr...)
> > > > I dug into this a bit and unfortunately it looks like a thorny issue. It looks like Instructions are (de)allocated via `User::operator new` and `User::operator delete`, and these essentially call into the system allocator. Relying on pointer equality to identify a unique instruction before/after a pass modifies the IR might necessitate e.g. changing the `delete` implementation to never free memory. That might not be practical -- it's certainly a big hammer.
> > > > 
> > > > It's possible that there's some alternative approach, but I'm not sure whether they'd be effective. One idea might be to use metadata to uniquely identify instructions (not !dbg metadata, of course :), and to teach the `dropUnknownNonDebugMetadata` utility to preserve that "inst ID" metadata. (I'd be curious to get some other opinions on whether that's workable.)
> > > Thanks! I have also taken a look, and I agree that the gymnastic with the `new/delete` operators won't be practical choice.
> > > 
> > > Using some special kind of metadata can help us for this purpose. The "inst ID" metadata sounds good to me, since we are going to use it for this purpose **only**. We can call it `!debug-instr-number`, similar to the LiveDebugValues/InstrRef (D85741) work.
> > > 
> > > Let me try to implement a prototype for it, so we can see what are the pros && cons.
> > The metadata approach should work, although as discussed it may require changes to `dropUnknownNonDebugMetadata`. It might be a little messy though; I think it would also be viable to take the following approach: In the "before" step, populate a map of `Instruction* Instr -> WeakVH(Instr)`. With this map, you can check `Map[Instr] != nullptr` to determine whether `Instr` was deleted at any point in the pass; if it wasn't deleted, then you can simply use raw pointer comparison. This avoids the need to fiddle with the internals of `Instruction` that could potentially affect performance or the behaviour of other code.
> > 
> > I think it worth adding that there will be some noise with this approach; the DebugLoc metadata for any new generated instructions will be dependent on the DebugLoc metadata for the instruction they're based on. For example, cloning an instruction , replacing a conditional branch with an unconditional one, reassociating arithmetic, etc. In each of these cases, we will get a warning from the preservation pass that a new instruction did not receive a DILocation, even though the pass that generated it had no responsibility for it. It's technically correct, but it may result in a continuously increasing amount of noise from pass-to-pass, with the worst case being that there are no DebugLocs left at the start of a pass so that we get an error for every instruction created even if the pass itself is handling DebugLocs correctly.
> > 
> > Unfortunately I don't think there's a way around this issue; if we had a way of programmatically determining which Instruction(s) that a new instruction //should// derive its DebugLoc from, then we wouldn't have as much need for debugify to begin with.
> @StephenTozer thanks for the comment.
> 
> > The metadata approach should work, although as discussed it may require changes to dropUnknownNonDebugMetadata. It might be a little messy though; I think it would also be viable to take the following approach: In the "before" step, populate a map of Instruction* Instr -> WeakVH(Instr). With this map, you can check Map[Instr] != nullptr to determine whether Instr was deleted at any point in the pass; if it wasn't deleted, then you can simply use raw pointer comparison. This avoids the need to fiddle with the internals of Instruction that could potentially affect performance or the behaviour of other code.
> > 
> Interesting, but I think we'd still have the problem with the pointer reusing?
> 
> I still believe that we can use the approach from D85741, by using just simple unsigned IDs (I am making a prototype patch for it).
> 
> > I think it worth adding that there will be some noise with this approach; the DebugLoc metadata for any new generated instructions will be dependent on the DebugLoc metadata for the instruction they're based on. For example, cloning an instruction , replacing a conditional branch with an unconditional one, reassociating arithmetic, etc. In each of these cases, we will get a warning from the preservation pass that a new instruction did not receive a DILocation, even though the pass that generated it had no responsibility for it. It's technically correct, but it may result in a continuously increasing amount of noise from pass-to-pass, with the worst case being that there are no DebugLocs left at the start of a pass so that we get an error for every instruction created even if the pass itself is handling DebugLocs correctly.
> > 
> Actually, in the verify-each-pass-mode, it will point to the first pass that "caused" the issue with the instruction.
> 
> 
> 
> Interesting, but I think we'd still have the problem with the pointer reusing?

`WeakVH` will be nulled if the value it references is deleted; even if a new instruction is created that uses the same memory, the WeahVH object will still be equal to a `nullptr`.

> Actually, in the verify-each-pass-mode, it will point to the first pass that "caused" the issue with the instruction.

I'm still reading the patch, so I can't be certain that I understand everything yet, but I don't think what you're describing is possible. I do understand that if we drop the DebugLoc for an instruction, then we will get an error in the pass that dropped that DebugLoc, but any subsequent passes will not show an error for that instruction because it had no DebugLoc to lose.

What I'm referring to is when we create a new instruction that should derive its DebugLoc from one or more existing instructions, but those instructions have none. For example:

We have a basic block with a conditional branch instruction with a debug location attached:
```
while.body:                                       ; preds = %while.body, %while.body.lr.ph
    br i1 %tobool, label %while.cond.while.cond1.preheader_crit_edge, label %while.body, !dbg !57 
```

Some broken pass causes the debug location to be dropped; this is an error, and the DIVerifier reports it:
```
while.body:                                       ; preds = %while.body, %while.body.lr.ph
    br i1 %tobool, label %while.cond.while.cond1.preheader_crit_edge, label %while.body
```

The Jump Threading pass then determines that `%tobool` will always be false, and replaces the conditional branch with an unconditional branch that uses the cond branch's DebugLoc:
```
while.body:                                       ; preds = %while.body, %while.body.lr.ph
    br label %while.body
```

What should happen here? The jump threading pass has generated a //new// instruction with no DebugLoc attached, so DIVerifier will report it as an error. However, there is no error in the Jump Threading pass, it just has no DebugLoc to assign to the new instruction (assigning a Line 0 in this case would also be incorrect).

The more DebugLocs we lose, the more often these false errors will appear as every similar generated instruction has no way to generate a DebugLoc. I also believe there is no general method for recognizing the difference between "this new instruction has no DebugLoc because we have no way to generate one", versus "this new instruction has no DebugLoc because we didn't copy/calculate it correctly (i.e. a bug)".

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82545/new/

https://reviews.llvm.org/D82545