[PATCH] D44329: [WebAssembly] Added initial AsmParser implementation.

Mon Mar 19 10:22:11 PDT 2018

aardappel marked 13 inline comments as done.
aardappel added inline comments.

================
Comment at: lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp:215
+        .Case("f64", WebAssembly::F64_0)
+        .Case("i8x16", WebAssembly::V16I8_0)
+        .Case("i16x8", WebAssembly::V8I16_0)
----------------
aardappel wrote:
> dschuff wrote:
> > There are a couple of confusing things to me about this. First, the SIMD proposal has only one value type ([[ https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#simd-value-type | v128 ]]) and the wasm backend has a corresponding register class ([[ https://github.com/llvm-mirror/llvm/blob/master/lib/Target/WebAssembly/WebAssemblyRegisterInfo.td#L61 | V128 ]]) into which all the SIMD MVTs (v4f32 et al) can go. But in the assembly language the scalar instructions use the value type name in their mnemonic and the SIMD instructions use the lane division interpretation names. So that's a bit of inherent weirdness that will have to be reflected in this code somewhere. But the other thing is that this function is named `ParseRegType` which to me would indicate the type of register (i.e. register class which in C++ code would be e.g. `WebAssembly::I32RegClass`), but what it returns is actually a physical register number. What we maybe actually want is to reflect either a wasm value type (although it looks like that's maybe not actually even reflected directly in the MIR, just in the binary format) or an MVT (e.g. `MVT::i32` or `MVT::v4f32`). 
> > 
> > 
> > ... ok actually looking at the way these reg types are used I now actually understand the comment you wrote in `WebAssemblyOperand::RegOp`.  Not sure I have a better idea right now. I guess it's because the generated matcher expects the reg operand to be one of the physical registers and validates that?
> Yes, this was originally to please the generated matcher. Though looking at the code, it seems the different v128 layouts are only needed for `emitParam` / `emitLocal`, whereas the matcher would be fine with generic registers. I can try splitting these up and reverting to a single physical register type, and see if it still works.
Refactored how register types and layout are handled, and reverted to a single SIMD register in the latest patch.

================
Comment at: lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp:249
+        return Error("Cannot parse stack register index: ", StackRegTok);
+      unsigned RegNo = static_cast<unsigned>(ParsedRegNo);
+      if (StackOp == "push") {
----------------
dschuff wrote:
> jgravelle-google wrote:
> > aardappel wrote:
> > > dschuff wrote:
> > > > aardappel wrote:
> > > > > dschuff wrote:
> > > > > > Up until now, the stack "reg" numbers on the "$push" and "$pop" tokens were really just annotations for readability, (because of course the operands push and pop on the stack in a defined order). So far nothing has actually depended on those notations, and this seems to be adding that dependence. Not that we absolutely can't do that; but it would be nice if we didn't have to. It looks like it might make this code more awkward though, because it appears that we are just getting the lexer tokens in order, and we'd maybe have to buffer them or something before processing them in right-to-left order. Am I understanding this correctly?
> > > > > The generated matcher does need registers in pretty much the same locations as these, so thats why it seemed natural to follow along. Certainly the number part is non-essential, since I am essentially re-using the assignment from the disassembler, though to accept code that does not have these numbers, I presume I would have to track a stack to able to assign them myself. That's not hard, but also not entirely trivial, so that may be a nice thing to do for a followup. In fact, I think it be nice to be able to not require these at all, and completely skip them on parse, and have a mode (initially for the non-elf path) that makes the dissassembler not output them either. Opinions welcome.
> > > > Yeah, that was sort of what I was getting at. s2wasm ignores the numbers entirely and tracks the stack itself. I do think it would be nice to not need them at all.
> > > > 
> > > > I think It sort of comes down to what we want the assembly language to look like. The non-ELF path will soon be the only path, so we don't need to worry about it for the long term. I think my preference for the language would be to keep it simpler and not require the numbers. This would require the assembler to track the stack, but even if it used the numbers, it would probably need to validate them anyway so overall it wouldn't be much different. I do actually like having the numbers as an annotation though (it makes the asm easier to read), and there is a way to attach annotations to assembly output, so maybe we should use it for that.
> > > agreed. I think it should be an option for the dissassembler (I personally would prefer to read the code without them), and the assemble should permit but ignore them.
> > > 
> > > Tracking the stack myself may be a bit more interesting than usual, since the current assembler code has no idea of instruction semantics, it leaves all of that to the table-gen matcher. So I'd need to at least dip into the table-gen to extract information on instruction signature if I  want to avoid hard-coding any of that.
> > To throw my own opinions in the mix,
> > Design goals I think are important:
> > 1) the .s syntax should be easy to read
> > 2) it should support inline asm fairly easily
> > 
> > To that end we could take and existing (wasm-elf) .s:
> > 
> > ```
> > i32.const $push3=, 0
> > i32.load  $push2=, __stack_pointer($pop3)
> > i32.const $push4=, 16
> > i32.sub   $1=, $pop2, $pop4
> > ```
> > 
> > and write it as
> > ```
> > i32.const 0
> > i32.load  offset=__stack_pointer
> > i32.const 16
> > i32.sub
> > set_local 1
> > ```
> > which is generally identical to .wat. We (in principle) can do this because the push/pop annotations are essentially cosmetic: we know them statically on a per-opcode basis. (This is almost certainly less the case for function calls.)
> > This should be the easiest thing to write inline asm for, and map most directly to the official text format, which should help the learning curve.
> > 
> > As a secondary idea, we keep the numbers, but observe that we can reuse virtual registers / stack slots after popping. Then the number refers to the depth of the stack at any given point. This also serves as a proxy for register pressure in the engines.
> > ```
> > i32.const $push0=, 0
> > i32.load  $push0=, __stack_pointer($pop0)
> > i32.const $push1=, 16
> > i32.sub   $1=, $pop0, $pop1
> > ```
> > 
> Yeah, when formatted as an annotation the push/pop labels would probably end up looking something like this, when using explicit locals:
> ```
> i32.const 0   #  push3
> i32.load  offset=__stack_pointer    #  pop3
> i32.const 16  #  push4
> i32.sub    # pop4, pop3, push4
> set_local 1 # pop5
> ```
> 
> Without explicit locals it gets more interesting because the actual parsed syntax is different from the spec (and of course we have to parse it). In that case it might make more sense to also explicitly print/parse pop operations.
> 
> But anyway we shouldn't be having this discussion here, an issue on tool-conventions probably makes more sense.
Agreed. I'm fine either making them optional and ignored (in wasm mode, not elf), or comments, or omitted all-together. I am assuming this will be in a follow-up though. I can start a conversation on tool-conventions to include everyone.

https://reviews.llvm.org/D44329