<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Nov 19, 2019 at 5:26 PM Quentin Colombet <<a href="mailto:qcolombet@apple.com">qcolombet@apple.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><div class="gmail_quote"><div><div><div><div><div><div><div dir="ltr"><div>Disclaimer: I haven’t checked how feasible that is.</div><div><br></div><div>In InstrEmitter, when we go through all the definitions for variadic instructions, we should map them either to a physreg or virtual register on a value per value basis. The thing I am not sure is how to check what mapping (phys or virt) we want for a specific value.</div><div><br></div><div>We may need the new bit you are suggesting, but I am afraid it may be too coarse grain. Indeed, I imagine it could be possible to have a variadic instruction with both implicit physreg def and variadic (in the sense not described in MCDesc yet not implicit) virtual reg. Thus, I would rather we explore other ways first.</div></div></div></div></div></div></div></div></div></div></blockquote><div><br></div><div>I'm not sure we need to support instructions that mix physical and virtual registers. As you pointed out before, it doesn't make sense to use virtual registers on machines that have physical registers. For machines that don't have physical registers (like WebAssembly), it similarly doesn't make sense to use physical registers. I guess this is more a property of a machine than of individual instructions. This agrees with your alternative solution below.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><div class="gmail_quote"><div><div><div><div><div><div><div dir="ltr"><div><side problem></div><div>Also related to this variadic (implicit or not) virtual register thing, there is another problem down the line. If we end up splitting those live-ranges, we may not be able to compute a register class for them, since we usually get them from the MCDesc.</div><div>For instance:</div><div>V1, V2 = call</div><div>…</div><div>During regalloc we split say V2:</div><div>V1, V3 = call</div><div>V2 = copy V3 <== now we don’t know how to constrain V3, because the MCDesc of `call` doesn’t describe the second definition.</div><div>…</div><div><br></div><div>I am guessing WebAssembly wouldn’t have this problem since there is probably no live-range splitting going on, but if we add this feature, some other targets may run into it.</div><div></side problem></div></div></div></div></div></div></div></div></div></div></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><div class="gmail_quote"><div><div><div><div><div><div><div dir="ltr"><div><br></div><div><br></div><div>* Alternative solution *</div><div><br></div><div>Correct me if I am wrong, but WebAssembly does not have any physreg, right?</div><div>If so, SDISel code does the right thing by assuming this additional implicit registers need to be mapped to physreg. The problem here is in WebAssembly case, a phys reg is just a virtual register (IIRC, regalloc for WebAssembly just packs virtual regs) but SDISel doesn’t know about that. Therefore, maybe what we want, instead of a bit on each instruction is a target lowering hook that says, this target uses virtual reg as phys reg (I believe we may even already have that for PEI to work properly for WebAssembly) and adapt SDISel code accordingly.</div><div> </div><div>What do you think?</div></div></div></div></div></div></div></div></div></div></blockquote><div><br></div><div>I like this idea a lot. We do already have the `usesPhysRegsForPEI` hook, and WebAssembly is the only upstream user. What do you think about renaming this hook to something more general like `usesPhysicalRegs` and reusing it to solve the current problem? I expect there will be other issues for which that hook would come in handy as well.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><div class="gmail_quote"><div><div><div><div><div><div><div dir="ltr"><div><br></div><div>Cheers,</div><div>-Quentin</div><div><br></div><div><br></div></div></div></div></div></div></div></div></div><div><br><blockquote type="cite"><div>On Nov 19, 2019, at 4:22 PM, Thomas Lively <<a href="mailto:tlively@google.com" target="_blank">tlively@google.com</a>> wrote:</div><br><div><div dir="ltr">Can you elaborate on the fix you are thinking of? I'm not sure what you're thinking should change.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Nov 19, 2019 at 3:51 PM Quentin Colombet <<a href="mailto:qcolombet@apple.com" target="_blank">qcolombet@apple.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>It sounds to me that we should fix SDISel to accept both physical and virtual definitions on variadic instructions. Though I wouldn’t bother adding the support for implicit virtual definition.<div><br></div><div>I don’t think we need a bit to allow if we go that direction.<br><div><div><div><br><blockquote type="cite"><div>On Nov 19, 2019, at 10:29 AM, Thomas Lively <<a href="mailto:tlively@google.com" target="_blank">tlively@google.com</a>> wrote:</div><br><div><div dir="ltr">To get into more detail, I'm trying to update WebAssembly's `call` instruction. `call` is currently constrained to return one or zero arguments, so in TableGen we have a separate call Instruction for each possible return type. But I need to update calls to return arbitrarily many values of any combination of WebAssembly types, so even if we imposed some reasonable artificial limit like 8 return values, this approach would require generating billions of Instructions in TableGen. So that won't work.<div><br></div><div>The other approach is to have just one call instruction that has variadic outputs as well as inputs. This is the approach I am pursuing now, and I have it working through the lowering to MachineSDNodes, but the approach is breaking down in the InstrEmitter because it is trying to use physical registers.</div><div><br></div><div>WebAssembly is a stack machine, so all instructions including calls push their results onto the value stack as opposed to placing them in registers. The value stack is not in memory and not addressable, so it is not the same as the normal stack of call frames, which we also have. Instead, the value stack takes the place of registers. The way we model the value stack is by using virtual registers exclusively in the MachineInst layer, then rearranging (i.e. "stackifying") the instructions and removing all registers entirely right before lowering to MC. We skip register allocation entirely, since WebAssembly does not have registers to allocate. We use virtual registers instead of physical registers because the value stack can contain an unbounded number of values.</div><div><br></div><div>In order to fit this modeling, I need a new virtual register to be allocated for each def of the generalized call instruction. The stackification passes in the WebAssembly backend will correctly interpret this as the call's return values being pushed onto the value stack.</div><div><br></div><div>The other option I am considering would be to lower calls to contain a chain of glued pseudo nodes that each explicitly defs a single register, the collapsing that chain of nodes back into a single MachineInstruction using a custom inserter hook. That seems similar to your suggestion of using a pseudo, but I do still need to get just a single call MachineInstruction after instruction selection. Using an ephemeral chain of pseudo nodes seems like a hack, since it is working around a modeling mismatch between the InstrEmitter and the WebAssembly backend.</div><div><br></div><div>To avoid that workaround, I would prefer to add a bit named something like "implicitDefsAreVirtual" that I could just set for all WebAssembly instructions. WebAssembly is the only upstream stack machine target, but other downstream stack machine targets such as TON Labs' TVM have adopted our approach of using virtual registers and would also benefit from having such a bit.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Nov 19, 2019 at 9:04 AM Quentin Colombet <<a href="mailto:qcolombet@apple.com" target="_blank">qcolombet@apple.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto">Hi Thomas,<div><br></div><div>If you know a specific instruction pushes a value on the stack and want to represent that as a virtual register (as opposed to a frame index for instance), you could use an explicit def with a pseudo that you later lower to your actual instruction with the stack access.</div><div><br></div><div>I am not sold on the bit approach for now. It seems to me that we are working around a proper modeling. If you want to pursue with that approach, I would need more information on the semantic of that bit, how it is going to be used, what problem it solves and so on, to give a better opinion.</div><div><br></div><div>Cheers,</div><div>Q<br><div dir="ltr"><br><blockquote type="cite">Le 18 nov. 2019 à 19:35, Thomas Lively <<a href="mailto:tlively@google.com" target="_blank">tlively@google.com</a>> a écrit :<br><br></blockquote></div><blockquote type="cite"><div dir="ltr"><div dir="ltr">Hi Quentin,<div><br></div><div>Thanks, that explanation makes sense. I can see that in a normal register machine, implicitly defs must be physical registers. In a stack machine like WebAssembly, though, implicit defs are known to be pushed onto the value stack just like any other defs. Slots on the value stack are represented by virtual registers until stackification, so for WebAssembly we do need the implicit defs to be stored in virtual registers. I guess the best thing to do for now would be to add a bit to the MCInstrDesc carrying this information.</div><div><br></div><div>Thomas</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Nov 18, 2019 at 6:53 PM Quentin Colombet <<a href="mailto:qcolombet@apple.com" target="_blank">qcolombet@apple.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>Hi Thomas,<div><br></div><div>The reason is simply because an implicit def of a virtual register doesn’t make sense in theory.</div><div><br></div><div>The rationale is let say that an instruction implicitly defines a virtual register. The implicit aspect of it means that this information cannot be “seen”. Thus, when we finally emit the instruction, there is nothing in the encoding or anywhere that is going to tell where the value has been physically defined.</div><div><br></div><div>The only way something is implicitly defined is when everybody knows where the value is going to end up and that’s only achievable in a physical register.</div><div><br></div><div>Cheers,</div><div>-Quentin<br><div><br><blockquote type="cite"><div>On Nov 18, 2019, at 6:21 PM, Thomas Lively via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:</div><br><div><div dir="ltr">Hi all,<div><br></div><div>I need to figure out why InstrEmitter::EmitMachineNode assumes that when the number of outputs of a MachineSDNode is greater than the number of defs in the corresponding MCInstrDesc, the outputs in the difference will be placed into physical registers as opposed to virtual registers.<br></div><div><br></div><div>The specific line in question is:</div><div>    bool HasPhysRegOuts = NumResults > NumDefs && II.getImplicitDefs()!=nullptr;</div><div><br></div><div>Where NumResults is the number of outputs in the MachineSDNode and NumDefs comes from the MCInstrDesc and ultimately the TableGen definition of the instruction. I do not know why this assumption is made or what code depends on it, but it is over 12 years old: <a href="https://github.com/llvm/llvm-project/commit/c5549fc3a055710a3958a5b3164db3fa167c720c#diff-9dfdb934cb2b01ba001da4bbf3c5cb3cR632" target="_blank">https://github.com/llvm/llvm-project/commit/c5549fc3a055710a3958a5b3164db3fa167c720c#diff-9dfdb934cb2b01ba001da4bbf3c5cb3cR632</a>. </div><div><br></div><div>The context for this question is that I am trying to implement an instruction for the WebAssembly backend that returns a variable number of operands. I am following the example of ARM's load multiple instructions, but this assertion in the instruction emitter is causing problems because WebAssembly, unlike ARM, does not use physical registers at all. Also, almost all WebAssembly instructions have an implicit def of the register that represents the argument stack, so II.getImplicitDefs() is necessarily non-null.</div><div><br></div><div>I am open to ideas about the best way forward. I am currently thinking that if this assumption is important for most targets, I might need to add a new bit to MCInstrDesc to disable this assumption.</div></div>

_______________________________________________<br>LLVM Developers mailing list<br><a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br><a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br></div></blockquote></div><br></div></div></blockquote></div>

</div></blockquote></div></div></blockquote></div>

</div></blockquote></div><br></div></div></div></div></blockquote></div>

</div></blockquote></div><br></div></blockquote></div></div>