[PATCH] D70930: [llvm-objcopy][WebAssembly] Initial support for wasm in llvm-objcopy

Mon Dec 16 10:25:36 PST 2019

dschuff added inline comments.

================
Comment at: llvm/tools/llvm-objcopy/wasm/Object.h:21
+
+class Section {
+public:
----------------
alexshap wrote:
> Sorry for the delays with code review, I'll try to catch up on all the diffs in my backlog soon.
> I'm not an expert in WASM format, so please take my comments with a grain of salt.
> 
> Okay, what would you say to the following approach ?
> 
>    struct Section {
>       uint8_t Type;
>       StringRef Name;
>       ArrayRef<uint8_t> Content;
>    };
> 
> If in the future we need to add new sections with some new content - we can use a StringSaver NewSectionsContents
> (i.e. it can be a field of the class Object) which would be responsible for holding the corresponding buffers (which are not owned by the input object)).
> 
> Looks like in this case the code would be a tiny bit cleaner / less verbose.
> 
This interface (owning the contents) is actually used in D70970, but not here; I'll move it to the next review.
For now I kept the naming ("contentsRef") the same. in the next review we can discuss the current approach (which is meant for moving the contents) vs. your suggested approach which would copy them.

================
Comment at: llvm/tools/llvm-objcopy/wasm/Writer.cpp:30
+    raw_svector_ostream OS(Header);
+    encodeULEB128(S.SectionType, OS);
+    bool HasName = S.SectionType == WASM_SEC_CUSTOM;
----------------
sbc100 wrote:
> Its section type an LEB, or a single byte?   You use +1 below we means this should probably be a single byte write, no?
You're right, it's a single byte in the spec; it doesn't make any difference if it's less than 128 and I think we are not 100% consistent in the tools code. But I fixed it here.

================
Comment at: llvm/tools/llvm-objcopy/wasm/Writer.h:32
+
+  size_t finalize();
+};
----------------
alexshap wrote:
> sbc100 wrote:
> > alexshap wrote:
> > > if I'm not mistaken this method is not really "finalize", 
> > > the main thing what it does - it properly encodes Sections + computes the total size.
> > > 
> > > What would you say to the following reorganization of this code:
> > > 
> > > Writer.h:
> > > 
> > >    classWriter {
> > >         ....
> > >         Error write(); 
> > > 
> > >     private:   
> > >         uint64_t totalSize() const { ... }
> > >     };
> > > 
> > > Writer.cpp:
> > >     uint64_t calculateSerializedSectionSize(const Section &S) {
> > >          uint64_t Size = S.Content.size();
> > >          if (S.SectionType == WASM_SEC_CUSTOM)
> > >               Size += (getULEB128Size(S.Name.size()) + S.Name.size());
> > >          return Size;
> > >      }
> > > 
> > >      void writeSection(const Section &S, uint8_t* Out) {
> > >           ....
> > >      }
> > > 
> > >      uint64_t Writer::totalSize() const {
> > >          ....
> > >      }
> > > 
> > >      Error Writer::write() {
> > >          ....
> > >      }
> > > 
> > > Side note: it looks like in this case we eliminate SectionData - decrease the number of copies/ memory consumption.
> > In the lld codebase we use "finalize" on section to mean exactly that: "properly encodes Sections + computes the total size".
> > 
> > Do you have other other meaning of the work finalize in mind?   Perhaps `finalizeContents()` would be more explicit?
> it feels like it's unnecessary to keep around one extra copy (SectionData)  (especially if it doesn't contain any fields which need to be computed from the final layout (i.e. global offsets, indices, etc)) + that way it looks simpler to me unless I'm missing smth.
Actually the root problem here is that `SectionData` is totally misnamed. It's not actually the section payload at all, it's just the header, which does need to be computed from the final layout. (that's why it's a SmallVector). I think I originally had both headers and data and I must have renamed/eliminated the wrong name when I change it.
I just renamed it `SectionHeader` and it all hopefully makes more sense now.

In light of that, `finalize` does more than just compute the size, it generates the header as well, and as @sbc100 says, matches the way we use the term in the wasm backend and lld.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70930/new/

https://reviews.llvm.org/D70930