[PATCH] D49678: [llvm-objcopy] Adding support for compressed DWARF debug sections.

Mon Aug 27 13:41:15 PDT 2018

jakehehrlich added inline comments.

================
Comment at: tools/llvm-objcopy/Object.h:336
+protected:
+  std::string OwnedName;
   std::vector<uint8_t> Data;
----------------
plotfi wrote:
> jakehehrlich wrote:
> > plotfi wrote:
> > > jakehehrlich wrote:
> > > > So it turns out that we want to make names std::string instead of StringRefs anyway so this is no longer needed. (two other changes also needed this). That will likely be landing soon.
> > > ETA?
> > I think that's happening in this change: https://reviews.llvm.org/D50463
> > 
> > I think Paul is finished with GSoC so it's up to Paul. You can go ahead and use code from that change in this change so that the rebase will be minimal.
> As far as I can tell, this is not comitted yet. Should I wait for this to land or take it from the diferential you sent?
Given that Paul hasn't worked out that and how close this patch is to getting an LGTM from me I think you should be the one to make that change in this patch.

================
Comment at: tools/llvm-objcopy/Object.h:328

 class OwnedDataSection : public SectionBase {
   MAKE_SEC_WRITER_FRIEND
----------------
plotfi wrote:
> jakehehrlich wrote:
> > I think you should just not change OwnedDataSection. It has a very small amount of functionality that isn't worth trying to inherit.
> I was actually encouraged to inherit from OwnedDataSection in a previous review. What exactly should I do here? I need more clarity here in-order to move forward. 
I'll leave it to your judgement then. I don't have strong feelings about it. It is my estimation that using OwnedDataSection will make it so that it is more difficult to avoid unnecessary copies in data. Ideally I'd like to just have the input data as an ArrayRef that comes from the mapped file and then have the compression library write directly into the storage we use for CompressedSection. If that turns out not to be true then I'm more than happy for OwnedDataSection to be used. It might even be helpful to rework OwnedDataSection if turns out that I'm right. This is not the sort of issue that should block you.

================
Comment at: tools/llvm-objcopy/Object.h:369
+public:
+  CompressedSection(const SectionBase &Sec,
+                    DebugCompressionType CompressionType)
----------------
plotfi wrote:
> jakehehrlich wrote:
> > The Size needs to be set in the constructor. Prior to all the fancy changes that people are making now. Most Sizes could be quite trivially determined. Size is used in layout. Because I didn't foresee Size needing to be determined dynamically for different systems, it's just a single variable when it really needs to be a virtual function that takes some magical argument that will allow it to know the proper size for a given output format. The issues here haven't been fully fleshed out but I think a solution that Alex and James proposed to solve other issues and not just this one)could be made to solve this. For the time being, the best option is to use an upper bound. It sucks; I dislike it, but I don't have a better recommendation.
> > 
> > I should finally get the time I need to refactor this in a couple weeks. I should be able to solve a lot of these overarching issues then.
> > 
> > In the meantime, I think this needs to compress the data within the section so that it can get a reasonable upper bound on the size.
> Oh you're saying that CompressedSection::Size should be the compressed size, not the decompressed size??? At the moment I am doing compression in the visitor. 
> 
> I still dont understand why size needs to be passed as a constructor parameter. Isn't that going to come from the Section I am passing in anyways?
Yep. To be precise. It's the field that becomes st_size in the section header so it has to be correct for that. Unfortunately what st_size is is not simply dependent on the size of compressed data but also on the size of of Chdr which you can't actually know the way things are. The current contract states that Size is the one field which must be correct at all times. This is because layout has to happen prior to finalization but layout needs to know the size. Thinking about it now, alignment actually must obey the same contract but that was less obvious since alignment is normally just unmodified.

There are two sizes that matter here, the size of the original data and the size after compression plus the header and magic string. Overestimating is acceptable, but not ideal.

You don't need to pass in the size of the original data since you already pass in the section and the OriginalData field contains that information. You *can't* pass in the compressed size because you don't yet know that information. Since Size must be set correctly on the exit of each method that also means that size must be correct after construction. So I don't really see how to get around compressing the data in the constructor. Plus I think that helps solve the problem of minimizing the number of copies that have to be made (namely, you can avoid ever copying the data)

Repository:
  rL LLVM

https://reviews.llvm.org/D49678