[LLVMdev] bit code file incompatibility due to debug info changes

Eric Christopher echristo at gmail.com
Fri Nov 22 09:54:02 PST 2013


On Fri, Nov 22, 2013 at 9:08 AM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Thu, Nov 21, 2013 at 4:17 PM, Manman Ren <manman.ren at gmail.com> wrote:
>>
>>
>>
>>
>> On Thu, Nov 21, 2013 at 12:01 PM, David Blaikie <dblaikie at gmail.com>
>> wrote:
>>>
>>>
>>>
>>>
>>> On Thu, Nov 21, 2013 at 11:45 AM, Manman Ren <manman.ren at gmail.com>
>>> wrote:
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Nov 21, 2013 at 11:26 AM, David Blaikie <dblaikie at gmail.com>
>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Nov 21, 2013 at 11:06 AM, Manman Ren <manman.ren at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Nov 21, 2013 at 10:57 AM, David Blaikie <dblaikie at gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Nov 21, 2013 at 10:38 AM, Manman Ren <manman.ren at gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Nov 19, 2013 at 8:47 PM, Manman Ren <manman.ren at gmail.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Nov 19, 2013 at 5:26 PM, Manman Ren <manman.ren at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Nov 18, 2013 at 11:00 AM, Chandler Carruth
>>>>>>>>>> <chandlerc at google.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Nov 18, 2013 at 10:55 AM, David Blaikie
>>>>>>>>>>> <dblaikie at gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> It depends a bit, also, on what kind of guarantees we need to
>>>>>>>>>>>> offer. If the guarantee when reading IR from disk is "will not crash" then
>>>>>>>>>>>> there's nothing for it but to run full debug info verification.
>>>>>>>>>>>>
>>>>>>>>>>>> On the other hand, if we can assume that some specific metadata
>>>>>>>>>>>> implies the correctness of some other metadata, then all we need to do is
>>>>>>>>>>>> check a known debug info version number. If it's the current number, do
>>>>>>>>>>>> nothing (assume the rest of the debug info is up to date and well formed),
>>>>>>>>>>>> otherwise do all the work to find the debug info and dump it (no need to
>>>>>>>>>>>> verify it - we just have to do the work to find it, so we don't go following
>>>>>>>>>>>> bad links later on - that should be as easy as dropping the llvm.dbg.cu
>>>>>>>>>>>> named node and removing all debug intrinsics and the instruction metadata
>>>>>>>>>>>> line references). But this latter scheme isn't robust against arbitrary
>>>>>>>>>>>> metadata (that could, coincidentally, have the right version number and
>>>>>>>>>>>> arbitrary metadata that breaks all our debug info metadata assumptions)
>>>>>>>>>>>>
>>>>>>>>>>>> If the latter is sufficient for everyone's needs/principles,
>>>>>>>>>>>> great.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> This makes sense to me, but I see Eric's fundamental concern with
>>>>>>>>>>> upgrading test cases.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> One other possible idea: we could artificially force coarseness
>>>>>>>>>>> on metadata while things are still iterating rapidly by just having a
>>>>>>>>>>> version number that rotates every "release". (Even non-open-source-releases
>>>>>>>>>>> could be handled by letting anyone, any time they need to rev it, update
>>>>>>>>>>> what the current version is and update every test case to reference that
>>>>>>>>>>> version.) If the version is nicely factored, it should at least be an
>>>>>>>>>>> obvious diff if still huge.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Adding a debug info version number and ignoring the debug info
>>>>>>>>>> MDNodes when version does not match makes sense. And changing the version
>>>>>>>>>> number with every release sounds reasonable.
>>>>>>>>>>
>>>>>>>>>> One implementation is
>>>>>>>>>> 1> completely get rid of version number in the tag
>>>>>>>>>>   LLVMDebugVersion is only used in two places and we should be
>>>>>>>>>> able to remove it.
>>>>>>>>>>     include/llvm/DebugInfo.h:    return getUnsignedField(0) &
>>>>>>>>>> ~LLVMDebugVersionMask;
>>>>>>>>>>     lib/IR/DIBuilder.cpp:  assert((Tag & LLVMDebugVersionMask) ==
>>>>>>>>>> 0 &&
>>>>>>>>>>     lib/IR/DIBuilder.cpp:  return
>>>>>>>>>> ConstantInt::get(Type::getInt32Ty(VMContext), Tag | LLVMDebugVersion);
>>>>>>>>>>   I remember there was something that blocks David from completely
>>>>>>>>>> getting rid of version number, but it seems the blocker is gone now.
>>>>>>>>>>   This can be done separately as well.
>>>>>>>>>>
>>>>>>>>>> 2> introduce a debug info version in module flags similar to
>>>>>>>>>> "dwarf version" module flag
>>>>>>>>>>      LTO can correctly handle the module flag during linking and
>>>>>>>>>> we only need to change one line in the debug info testing cases when we
>>>>>>>>>> update debug info version number.
>>>>>>>>>>      One issue is that we don't have a mode that emits a warning
>>>>>>>>>> and outputs the largest value (we have a mode that emits a warning and
>>>>>>>>>> outputs the value from the first module, but we can definitely add another
>>>>>>>>>> mode if necessary)
>>>>>>>>>>
>>>>>>>>>> 3> when to drop the debug info MDNodes?
>>>>>>>>>>      We could do this in the bit code loader but it seems a
>>>>>>>>>> violation of layers. One possibility is to run a module pass right after
>>>>>>>>>> loading the bit code, to remove debug intrinsics and debug tags on
>>>>>>>>>> instructions.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> We have an existing pass StripSymbols that can strip debug info in
>>>>>>>>> the module.
>>>>>>>>> So it is just a matter of adding that pass after bit code loader if
>>>>>>>>> the debug info version number does not match.
>>>>>>>>
>>>>>>>>
>>>>>>>> Can I assume no objection to the approach? :)
>>>>>>>> Bill, are you okay with adding another mode to module flags? i.e.
>>>>>>>> emits a warning and outputs the largest value.
>>>>>>>
>>>>>>>
>>>>>>> How will this work as a module flag, though? If the flag gets merged
>>>>>>> and maximized, how will we know which compile units have out of date debug
>>>>>>> info metadata and drop them?
>>>>>>
>>>>>>
>>>>>> The dropping part comes from item 3 above: adding StripSymbols pass
>>>>>> right after bit code loader if the debug info version number does not match.
>>>>>
>>>>>
>>>>> (what Eric said, but also)
>>>>
>>>>
>>>>>
>>>>>
>>>>> I'm just not really understanding how you choose which things to strip
>>>>> in (3). If you've already merged IR modules together and the debug info
>>>>> version is bumped to the maximum - if we had an old and a new module, the
>>>>> resulting version would be 'new' and we wouldn't know we still had some old
>>>>> data in there we needed to remove, would we? (and even if we did, we
>>>>> wouldn't know which of the compile units we needed to drop and which
>>>>> intrinsics we needed to strip, etc - since some was new and good and some
>>>>> was old and busted)
>>>>
>>>>
>>>> To David,
>>>>
>>>> I was saying that running StripSymbols pass right after loading a bit
>>>> code file (a source module), then the merging part will come afterwards when
>>>> linking the source module in.
>>>
>>>
>>> OK - but then we'd need another special case for non-LTO, when someone's
>>> just passed a bitcode file to Clang, for example - yes?
>>>
>>> By building it into the loader or as close to that as possible (so that
>>> any LLVM code loading a module never sees out of date debug info metadata)
>>> we ensure we only have to change one place, not every client of bitcode
>>> loading.
>>
>>
>> We can drop the debug info in the auto upgrading path similar to how we
>> upgrade TBAA tags.
>> I am going to move StripDebugInfo from StripSymbols.cpp to DebugInfo.cpp
>> so the implementation can be shared between StripSymbols and AutoUpgrade.
>> After that, a sample patch looks like:
>> Index: include/llvm/AutoUpgrade.h
>> ===================================================================
>> --- include/llvm/AutoUpgrade.h  (revision 195264)
>> +++ include/llvm/AutoUpgrade.h  (working copy)
>> @@ -57,6 +57,10 @@
>>    /// with different address spaces: the instruction is replaced by a
>> pair
>>    /// ptrtoint+inttoptr.
>>    Value *UpgradeBitCastExpr(unsigned Opc, Constant *C, Type *DestTy);
>> +
>> +  /// Check the debug info version number, if it is out-dated, drop the
>> debug
>> +  /// info.
>> +  bool UpgradeDebugInfo(Module &M);
>>  } // End llvm namespace
>>
>>  #endif
>> Index: lib/AsmParser/LLParser.cpp
>> ===================================================================
>> --- lib/AsmParser/LLParser.cpp  (revision 195264)
>> +++ lib/AsmParser/LLParser.cpp  (working copy)
>> @@ -182,6 +182,8 @@
>>    for (Module::iterator FI = M->begin(), FE = M->end(); FI != FE; )
>>      UpgradeCallsToIntrinsic(FI++); // must be post-increment, as we
>> remove
>>
>> +  UpgradeDebugInfo(*M);
>> +
>>    return false;
>>  }
>>
>> Index: lib/Bitcode/Reader/BitcodeReader.cpp
>> ===================================================================
>> --- lib/Bitcode/Reader/BitcodeReader.cpp        (revision 195264)
>> +++ lib/Bitcode/Reader/BitcodeReader.cpp        (working copy)
>> @@ -3152,6 +3152,7 @@
>>    for (unsigned I = 0, E = InstsWithTBAATag.size(); I < E; I++)
>>      UpgradeInstWithTBAATag(InstsWithTBAATag[I]);
>>
>> +  UpgradeDebugInfo(*M);
>>    return error_code::success();
>>  }
>>
>> Index: lib/IR/AutoUpgrade.cpp
>> ===================================================================
>> --- lib/IR/AutoUpgrade.cpp      (revision 195264)
>> +++ lib/IR/AutoUpgrade.cpp      (working copy)
>> @@ -12,6 +12,7 @@
>>
>> //===----------------------------------------------------------------------===//
>>
>>  #include "llvm/AutoUpgrade.h"
>> +#include "llvm/DebugInfo.h"
>>  #include "llvm/IR/Constants.h"
>>  #include "llvm/IR/Function.h"
>>  #include "llvm/IR/IRBuilder.h"
>> @@ -489,3 +490,11 @@
>>
>>    return 0;
>>  }
>> +
>> +bool llvm::UpgradeDebugInfo(Module &M) {
>> +  if (getDebugInfoVersionFromModule(M) == LLVMDebugVersion)
>> +    return false;
>> +
>> +  StripDebugInfo(M);
>> +  return true;
>> +}
>>
>> Of course, testing cases will be added when I actually submit the patch.
>>
>>>
>>>>
>>>>
>>>> To Eric,
>>>> < You'll want to do it at loading time and not worry about merging at
>>>> < all. I think this will need to go just after reading the module and
>>>> < check the version against the hard coded version in the bit code
>>>> < reader/somewhere. I think we should do it there rather than as a later
>>>> < pass as we really don't want to do anything to the metadata other than
>>>> < strip it.
>>>>
>>>> I think I was saying the same thing: right after bit code loader :)
>>>> Eric, are you suggesting to strip the debug info inside the loader?
>>>>
>>>> The merging part is mostly about where to store the debug info version
>>>> number in the bit code file.
>>>> If we store it as a module flag, we need to specify a mode for the
>>>> module flag so the linker knows how to merge the module flag.
>>>
>>>
>>> Except we'd never actually need to merge the flag - since we would drop
>>> it from any module that didn't have the latest, right? I'm not sure this
>>> would need to be a module flag, then, since there would be no need to define
>>> merging behavior. Is there an advantage to using a module flag over just
>>> putting it on the compile unit metadata node somewhere?
>>
>>
>> Agreed. So we have two choices here, use a module flag and remove the
>> module flag when stripping the debug info, so we will never merge modules
>> with different debug info versions. (no new mode is required)
>>
>> The other choice is putting it on the compile unit MDNode as a field.
>> There is no reliable way of checking if an old bc file has a version
>> number, because it requires parsing the CU MDNodes in the old bc files.
>> Also if we are not going to support multiple debug info versions in a
>> single compiler, there is no point of putting a version number in each CU
>> MDNode.
>
>
> Fair point. Any useful tradeoffs to consider between a single named MDNode
> and a module flag?
>

Probably simplicity at the moment. A module flag is pretty easy to
read/parse/etc at load time versus having to put it in the metadata
itself. In the metadata itself though we'd have everything self
contained (i.e. just stick it as a field on the compile unit or
something).

-eric

>>
>>
>> If we decide to go with the module flag, I can start adding the module
>> flag and update all existing testing cases. Let me know your vote :)
>>
>> Manman
>>
>>>
>>>>
>>>> If we decide to save the debug info version number as a module flag,
>>>> then we may need another mode for it.
>>>
>>>
>>> I would imagine we could strip the flag whenever we strip the debug info,
>>> then we would never merge modules that had differing debug info versions. So
>>> we could use any merging (or none at all) definition, since we'd only ever
>>> merge matching ones.
>>>
>>>>
>>>>
>>>> Thanks,
>>>> Manman
>>>>
>>>>>
>>>>>>
>>>>>> Item 3 is shared between non-LTO and LTO. For LTO, we want the debug
>>>>>> info version number gets merged and maximized.
>>>>>>
>>>>>> Thanks
>>>>>> Manman
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> - David
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Manman
>>>>>>>>
>>>>>>>>>
>>>>>>>>> 4> how to report warning?
>>>>>>>>>     Is errs() << "WARNING: " good enough? BTW this is how we emit
>>>>>>>>> warning when linking module flags.
>>>>>>>>>
>>>>>>>>> Any objection to the above? Any other suggestion?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Manman
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> LLVM Developers mailing list
>>>>>>>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>



More information about the llvm-dev mailing list