[PATCH] D70930: [llvm-objcopy][WebAssembly] Initial support for wasm in llvm-objcopy

Fri Dec 13 09:29:00 PST 2019

sbc100 accepted this revision.
sbc100 added a comment.
This revision is now accepted and ready to land.

Looks fine from a wasm perspective.

This is just the very first step towards a full wasm objcopy but its look like a good start to me.   Modulo a few comments inline.

================
Comment at: llvm/test/tools/llvm-objcopy/wasm/basic-archive-copy.test:42
+FileHeader:
+  Version:         0x00000001
+Sections:
----------------
jhenderson wrote:
> dschuff wrote:
> > jhenderson wrote:
> > > As mentioned in the now-deleted test, please remove the excessive whitespace, so that the Values in any given block line up neatly, but as close as possible to their tags, i.e. 
> > > ```
> > > --- !WASM
> > > FileHeader:
> > >   Version: 0x00000001
> > > Sections:
> > >   - Type: TYPE
> > >     Signatures:
> > >       - Index: 0
> > >         ParamTypes:
> > >           - I32
> > >         ReturnTypes:
> > >           - F32
> > >       - Index: 1
> > >         ParamTypes:
> > >           - I32
> > >           - I64
> > >         ReturnTypes: []
> > > # etc
> > >       - Type:   R_WASM_TABLE_INDEX_SLEB
> > >         Index:  0
> > >         Offset: 0x00000006
> > > ```
> > OK, but is it really worthwhile to deviate from the default output format of obj2yaml and hand-edit everything just so there's less whitespace? It doesn't look any more readable to me.
> Excessive whitespace for me at least makes it harder to read.
> 
> An obj2yaml-generated document is not much better than a canned-object file in most test cases. The vast majority (all?) of our ELF tests use YAML that has been hand-written to make sure it covers the things we care about and nothing more. obj2yaml may well generate significantly more output than is needed to test the different bits that are really being implemented, so whilst it may be useful to provide a basis, the output really should be cut down.
I feel like for yaml inputs like this should be the verbatim result of compiling some simple input file (either from .ll or .s) and as such should not be modified in place.  Instead we should include instructions on how to regenerate it periodically if/when the yaml format changes.

For yaml output checking I agree there could be argument made for trimming since we often only check for a tiny subset.

================
Comment at: llvm/test/tools/llvm-objcopy/wasm/basic-copy.test:13
+  Version: 0x00000001
+Sections:
+  - Type: TYPE
----------------
How was this yaml file generated?  Is it the same object as in the other test?  Should we instead commit this is a separate input file under: `llvm/test/tools/llvm-objcopy/wasm/Input/`.   At least you should document how to regenerate these inputs.

(I you do you can remove one level of `#` from these test files too)

================
Comment at: llvm/tools/llvm-objcopy/wasm/Writer.cpp:30
+    raw_svector_ostream OS(Header);
+    encodeULEB128(S.SectionType, OS);
+    bool HasName = S.SectionType == WASM_SEC_CUSTOM;
----------------
Its section type an LEB, or a single byte?   You use +1 below we means this should probably be a single byte write, no?

================
Comment at: llvm/tools/llvm-objcopy/wasm/Writer.h:32
+
+  size_t finalize();
+};
----------------
alexshap wrote:
> if I'm not mistaken this method is not really "finalize", 
> the main thing what it does - it properly encodes Sections + computes the total size.
> 
> What would you say to the following reorganization of this code:
> 
> Writer.h:
> 
>    classWriter {
>         ....
>         Error write(); 
> 
>     private:   
>         uint64_t totalSize() const { ... }
>     };
> 
> Writer.cpp:
>     uint64_t calculateSerializedSectionSize(const Section &S) {
>          uint64_t Size = S.Content.size();
>          if (S.SectionType == WASM_SEC_CUSTOM)
>               Size += (getULEB128Size(S.Name.size()) + S.Name.size());
>          return Size;
>      }
> 
>      void writeSection(const Section &S, uint8_t* Out) {
>           ....
>      }
> 
>      uint64_t Writer::totalSize() const {
>          ....
>      }
> 
>      Error Writer::write() {
>          ....
>      }
> 
> Side note: it looks like in this case we eliminate SectionData - decrease the number of copies/ memory consumption.
In the lld codebase we use "finalize" on section to mean exactly that: "properly encodes Sections + computes the total size".

Do you have other other meaning of the work finalize in mind?   Perhaps `finalizeContents()` would be more explicit?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70930/new/

https://reviews.llvm.org/D70930