[PATCH] D17321: DIEData, DIEWriter: introduce and begin migration.

Sun Feb 21 16:47:06 PST 2016

On Fri, Feb 19, 2016 at 8:08 PM, Eric Christopher via llvm-commits <
llvm-commits at lists.llvm.org> wrote:

>
>
> On Fri, Feb 19, 2016 at 7:45 PM Robinson, Paul <
> Paul_Robinson at playstation.sony.com> wrote:
>
>> You know I favor the pedantic answer. J  The point remains, type unit
>> signatures need to be content-dependent and dwo_ids really don't,
>> regardless of the details, because they have different consistency
>> requirements.
>>
>>
> Where content dependent can mean a lot of things :)
>
> That said, I really wish I could remember the dwo_id reasoning for using
> the hash for type units rather than a simple hash of the section.
>

One reason that sprung to my mind is type units themselves - the hash of
the debug_info section would be insufficient to capture the change in the
content of a type. Eg:

struct foo { int i; };
foo f;

With type units enabled, would produce the same hash if you make the int a
float instead.

So you'd at least have to hash the content of the types as well (but
perhaps you could hash the whole debug_types section instead of the
content-aware hashing)

So that's a best-guess right now: we didn't have the bytes in memory to
just hash, and we didn't want to use a frontend-provided identifier because
that would be needlessly volatile (comment changes would purturb it, etc -
you only want a hash of what ends up in the debug info, not more than
that). So it's possible that we can hash the bytes before we emit them and
patch them up (having the bytes in memory first isn't the end of the world
- LLVM's MC API currently does that anyway, so we can't avoid that by
streaming it out - but it dose double that requirement unless/until we
improve LLVM's MC APIs)

But I thought there were other reasons too - would have to look at what
things the type unit hash /doesn't/ hash and see why...

- Dave

>
> -eric
>
>
>> --paulr
>>
>>
>>
>> *From:* Eric Christopher [mailto:echristo at gmail.com]
>> *Sent:* Friday, February 19, 2016 7:07 PM
>> *To:* Robinson, Paul; Peter Collingbourne; Adrian Prantl
>>
>>
>> *Cc:* llvm-commits at lists.llvm.org
>> *Subject:* Re: [PATCH] D17321: DIEData, DIEWriter: introduce and begin
>> migration.
>>
>>
>>
>>
>>
>> On Fri, Feb 19, 2016 at 6:52 PM Robinson, Paul <
>> Paul_Robinson at playstation.sony.com> wrote:
>>
>> Type units are COMDATs produced across (potentially) many compilations,
>> so the key has to correspond to the content in a predictable/consistent way
>> in order for the linker to throw away duplicates. Therefore the type-unit
>> hash is specified the way it is.
>>
>>
>>
>> Doesn't matter. The way that we're doing it in clang is perfectly
>> acceptable for this.
>>
>>
>>
>> dwo_ids are just identifiers matching two lumps of data for the same CU,
>> produced by the same compilation, so they aren't sensitive to content (at
>> least not in the same way as type units).  But it would seem prudent for
>> two compilations of the "same" source to have different IDs, thus hashing
>> the .debug_info content might not be the best choice; a more random-number
>> kind of ID would be more appropriate.
>>
>>
>>
>> I don't see this either, but I know where you're coming from.
>>
>>
>>
>> -eric
>>
>>
>>
>> --paulr
>>
>>
>>
>> *From:* llvm-commits [mailto:llvm-commits-bounces at lists.llvm.org] *On
>> Behalf Of *Eric Christopher via llvm-commits
>> *Sent:* Friday, February 19, 2016 6:39 PM
>> *To:* Peter Collingbourne; Adrian Prantl
>> *Cc:* llvm-commits at lists.llvm.org
>> *Subject:* Re: [PATCH] D17321: DIEData, DIEWriter: introduce and begin
>> migration.
>>
>>
>>
>> On Fri, Feb 19, 2016 at 5:45 PM Peter Collingbourne via llvm-commits <
>> llvm-commits at lists.llvm.org> wrote:
>>
>> On Fri, Feb 19, 2016 at 03:47:22PM -0800, Adrian Prantl wrote:
>> >
>> > > On Feb 19, 2016, at 12:31 PM, Peter Collingbourne via llvm-commits <
>> llvm-commits at lists.llvm.org> wrote:
>> > >
>> > > them incorrectly, see DwarfDebug::makeTypeSignature). Since the page
>> at
>> > > https://gcc.gnu.org/wiki/DebugFission does not specify how the hash
>> is to be
>> > > computed, could we maybe do something simpler (e.g. MD5 hash the DIE
>> bytes
>> > > themselves) and avoid needing to use something like the DWARF type
>> hashing
>> > > algorithm for this?
>> >
>> > Where it can LLVM uses an MD5 sum over the mangled name of the type to
>> avoid the expensive DWARF type hashing.
>> >
>> > Just a side node: Debug info fission is in the process of being
>> standardized in DWARF 5. The progress can be tracked here:
>> >   http://dwarfstd.org/Issues.php?type=closed4 (search for “fission”
>> and “split DWARF”)
>> > The DWARF specification may deviate from the description on the GCC
>> wiki a little, but we probably want to follow the DWARF spec where that
>> makes sense.
>>
>> I see. From http://dwarfstd.org/ShowIssue.php?issue=130313.4 :
>>
>> >   5.  A DW_AT_dwo_id attribute whose value is an 8-byte
>> >       unsigned hash of the full compilation unit.  This hash
>> >       value is computed by the method described in Section 7.27
>> >       ("Type Signature Computation").
>>
>> That's unfortunate. Maybe I'm missing something, but I don't really see
>> why
>> the attribute value needs to be specified like that, as the hash is only
>> used by consumers to match skeleton units with full units, and it should
>> be
>> up to the producer to use a good enough hash. Is there any possibility of
>> changing the specification before DWARF 5 is standardised?
>>
>>
>>
>> It has been, but it's not really important as any two compilers aren't
>> going to agree on debug info output anyhow so they're unlikely to match. I
>> raised this in committee at the time as a "may" rather than "is". That
>> said, recent text (I don't have an absolutely updated version in front of
>> me) reads:
>>
>>
>>
>> A DW_AT_dwo_id attribute whose implementation-defined integer constant
>> value, known as the compilation unit ID, provides unique identification of
>> this compilation unit as well as the associated split compilation unit in
>> the object file named in the DW_AT_dwo_name attribute. For simplicity, the
>> DW_AT_dwo_id attributes in the skeleton compilation unit and the
>> corresponding split full compilation unit (see Section 3.1.3 on the
>> following page) must use the same form to encode this identification value.
>>
>>
>>
>> The means of determining a compilation unit ID does not need to be
>> similar or related to the means of determining a type unit signature.
>>
>>
>>
>> That said, ISTR at the time having a good reason why a hash of the
>> debug_info section wasn't a sufficient method for unique identifiers, but
>> my brain isn't coming up with it right now. Might have been related to not
>> having a finalized section before wanting to splat in the hash, but I'm not
>> sure.
>>
>>
>>
>> For the type ids we haven't worried because we're only supporting type
>> units for C++ types at the moment, if we wanted to support C types we'd
>> want to come up with another identifier so we could merge similar types
>> across compilation units.
>>
>>
>>
>> Hope this helps.
>>
>>
>>
>> -eric
>>
>>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160221/4d02309c/attachment.html>