[LLVMdev] Proposal: type uniquing of debug info for LTO

Tue Jun 25 12:05:12 PDT 2013

On Jun 25, 2013, at 11:35 AM, David Blaikie wrote:

> On Tue, Jun 25, 2013 at 10:13 AM, Manman Ren <mren at apple.com> wrote:
>> 
>> On Jun 25, 2013, at 9:15 AM, David Blaikie <dblaikie at gmail.com> wrote:
>> 
>> On Tue, Jun 25, 2013 at 8:59 AM, Manman Ren <mren at apple.com> wrote:
>> 
>> 
>> Any suggestion on how to move this forward?
>> My feeling is that we should not call Verify from so many files.
>> If you agree, I can try to clean up the Verify functions first.
>> 
>> 
>> Yes, we probably shouldn't be using Verify much. I haven't looked at
>> all the use cases though.
>> 
>> 
>> Verify called from the following 12 files other than DebugInfo DIBuilder
>> Dwarf
>> Index: tools/opt/opt.cpp
>> Index: lib/Target/NVPTX/NVPTXAsmPrinter.cpp
>> Index: lib/Transforms/Utils/Local.cpp
>> Index: lib/Transforms/Instrumentation/GCOVProfiling.cpp
>> Index: lib/Transforms/IPO/StripSymbols.cpp
>> Index: lib/Transforms/IPO/DeadArgumentElimination.cpp
>> Index: lib/CodeGen/SelectionDAG/FastISel.cpp
>> Index: lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
>> Index: lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
>> Index: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
>> Index: lib/CodeGen/MachineInstr.cpp
>> Index: lib/IR/AsmWriter.cpp
>> 
>> A sample patch which catches all uses of Verify outside Dwarfxxx.cpp
>> DebugInfo.cpp DIBuilder.cpp:
> 
> I don't really understand this patch.
> 
> 1) does Verify really need the map at all? Could we look at the cases
> where Verify would use the map & decide whether we could just not
> bother with those cases?
> 2) even if we were going to have such an API, why not have a default
> value for the parameter so we don't have to update existing callers?
> 3) point below \/ is that we should probably remove these Verify calls
> (except for assembly printing) - debug values & debug info metadata
> should be reasonably assumed to be valid if it's accessed through the
> relevant entry points (navigating from debug info for a
> function/variable, from a dbg_value/declare intrinsic, etc... we
> shouldn't be conditionalizing based on the verification of these
> values) - at some point we'd want to add an up-front verifier that
> checks these invariants in one go (& doesn't bother with it when the
> IR was generated from a trusted front end, in the same way don't use
> the IR verifier in those cases).

Sorry for the confusion, I was just trying to list all source locations that use Verify.
It is not a valid patch and I am not proposing to change the API for Verify.

> 
>> 
>> 
>> We are calling Verify to make sure a MDNode is indeed a specific DI node.
>> 
>> 
>> 
>> If we're using it to differentiate nodes (which I think I've done in
>> one or two places) that's probably a bit dodgy & we should do
>> something more explicit (based on tag values, etc).
>> 
>> I think the trickiest use of it (& even then it's not quite
>> sufficient) is in the assembly comment printing stuff - I'm not sure
>> there's a good alternative there. Treating any metadata that happens
>> to have an integer first operand that happens to be a tag value is...
>> unreliable at best. Not sure what to do about that.
>> 
>> Same feeling here. But we can only parse the content of the MDNode to know
>> whether it is a DI node and
>> if yes, what type of DI node it is.
> 
> Sure - but the question is how much parsing do we need to do. When I
> removed the debug info versioning we started getting a lot more
> conflicts with accidentally annotating non-debug metadata as debug
> metadata (you can see this firing on some of the line info debug
> nodes, for example). One possibility is we could restore something
> like that - a 'unique' number to make it sufficiently sane to just
> examine the number to decide if we're looking at a debug info node.
Sounds like a good idea to me. One number to enumerate all possible MD Nodes.

> 
> I'm not sure if that's the right answer.
> 
> In any case this problem should only apply to assembly printing - it
> doesn't explain why other parts of the codebase need to use Verify at
> all.

Totally agree. We should just be able to use isSubprogram in the following example.

--- tools/opt/opt.cpp   (revision 183537)
+++ tools/opt/opt.cpp   (working copy)
@@ -389,7 +389,7 @@
       for (unsigned i = 0, e = NMD->getNumOperands(); i != e; ++i) {
         std::string Name;
         DISubprogram SP(NMD->getOperand(i));
-        if (SP.Verify())
+        if (SP.Verify(0/*MAP*/))
           getContextName(SP.getContext(), Name);
         Name = Name + SP.getDisplayName().str();
         if (!Name.empty() && Processed.insert(Name)) {

It should be enough to just check isSubprogram(), instead of calling Verify, I guess the purpose
of calling Verify is to perform error checking, such as number of operands.

/// Verify - Verify that a subprogram descriptor is well formed.
bool DISubprogram::Verify(int MAP) const {
  if (!isSubprogram())
    return false;
  if (getContext() && !getContext().Verify())
    return false;
  DICompositeType Ty = getType();
  if (!Ty.Verify(MAP))
    return false;
  return DbgNode->getNumOperands() == 20;
}


> Those parts of the codebase actually trying to handle debug info
> should be handling valid debug info to begin with - we shouldn't need
> to call "Verify" all over the place & behave differently (at most it
> should be an assert - but it should probably just move entirely to a
> verifier like the IR verifier we already have).

Agree.
The current code base uses Verify to decide what type of DI node we are handling, as given in the above example.

Thanks,
Manman

> 
>> 
>> 
>> If you prefer to communicate via lists, let me know.
>> 
>> 
>> Generally, yes. Discussion should happen in the open unless there's a
>> need for it to be private.
>> 
>> As to the MDNode interface, I think debug info is the biggest user, other
>> usage is quite simple, such as TBAA.
>> 
>> 
>> I'm not sure what you're referring to here.
>> 
>> 
>> I am referring to the fact nobody else had comments on our proposal of
>> modifying MDNode :]
>> That probably means we are the ones who care the most about MDNode?
> 
> Probably - but it helps to explain to the community what we're doing &
> why we're doing it so it's not a surprise when they start seeing
> commits.
> 
> - David
> 
>> 
>> Thanks,
>> Manman
>> 
>> 
>> 
>> Thanks,
>> Manman
>> 
>> On Jun 21, 2013, at 10:01 PM, Manman Ren wrote:
>> 
>> 
>> Some scoping for option a:
>> 
>> There are a few Verify functions
>> DISubprogram::Verify
>> DICompositeType::Verify
>> DIType::Verify
>> that try to access the context link:
>> if (getContext() && !getContext().Verify())
>>  return false;
>> 
>> And the Verify functions are called from 10+ files.
>> 
>> If we are oaky with not calling getContext().Verify(), option a) appears
>> much better to me.
>> For printing | debugging purpose, we agree that not being able to trace the
>> pointer is okay.
>> 
>> Let me know your thoughts,
>> Manman
>> 
>> On Jun 21, 2013, at 5:01 PM, Eric Christopher wrote:
>> 
>> On Fri, Jun 21, 2013 at 4:59 PM, David Blaikie <dblaikie at gmail.com> wrote:
>> 
>> On Fri, Jun 21, 2013 at 4:50 PM, Manman Ren <mren at apple.com> wrote:
>> 
>> 
>> On Jun 21, 2013, at 4:41 PM, David Blaikie <dblaikie at gmail.com> wrote:
>> 
>> The reason I want a flag is to avoid the need to update the existing testing
>> cases while this is a work in progress.
>> I believe migrating one field means updating the existing testing cases?
>> 
>> 
>> It would, yes - and that's how you'd get coverage to know your change
>> was stable. You'll have to update all the tests eventually anyway &
>> doing so incrementally isn't substantially more expensive in my
>> experience. Perhaps there's something unique to this migration that
>> would make it so?
>> 
>> 
>> I am going to update the existing testing cases locally to have the test
>> coverage, but I don't want to check in the changes often.
>> Once we turn the flag on by default and remove the other path, I will submit
>> the changes on the testing cases.
>> 
>> 
>> I'd rather skip the flag & have the test coverage in the open along
>> with the changes. Having changes without testing is bad. I don't see a
>> benefit to you keeping the test coverage locally.
>> 
>> Another point to have a flag is to check in the patches in steps: DIBuilder
>> changes, changes related to the map, and DwarfDebug changes.
>> Without the flag, when I migrate the first field, I have to make sure it
>> works from frontend all the way to the backend.
>> 
>> 
>> True enough, but I don't think that's a great drawback.
>> 
>> Though technically, if you really wanted, you could start at the
>> backend & make the implementation conditional (check the type of the
>> field, if it's an MDNode - use that, if it's a string, do the lookup)
>> - implement that, then port DIBuilder, then remove the conditionality.
>> Though I don't like it a lot & I think the changes will be small
>> enough that frontend to backend will be still be understandable
>> patches.
>> 
>> 
>> All of this.
>> 
>> -eric
>> 
>> 
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> 
>> 
>>