<HTML><BODY style="word-wrap: break-word; -khtml-nbsp-mode: space; -khtml-line-break: after-white-space; "><BR><DIV><DIV>On Jul 30, 2007, at 12:42 PM, Christopher Lamb wrote:</DIV><BR class="Apple-interchange-newline"><BLOCKQUOTE type="cite"><DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>I contend that insert_subreg is target specific already. It currently requires a target specific subreg index, which is a kind of target specific hook, that tells coalescing how to deal with it. A two operand insert_subreg (or an insert_subreg from undef) is a move between target register classes that have subreg relationship. insert_subreg defines the entire superreg value, and I don't see why it's so bad to allow targets to specify their own semantics for what happens to the register being inserted into? This is essentially what the subreg index is already, we could even put semantics in the SubRegSet in the RegisterInfo.td allowing the semantics to be checked by the compiler.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>A parameter of the set that indicates that an insert into the subreg i either leaves the rest of the superreg value untouched (insert_subreg reg, reg, i), or it implicitly sets the rest of the register to a known value (insert_subreg constant_value, reg, i), or to undef (insert_subreg undef, reg, i). Only the specified semantics for that SubRegSet of the register class of the result of the insert_subreg would be valid, and could be ensured so.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>This seems to me to allow insert_subreg to capture may useful cases, and it captures the register set semantics in the RegisterInfo.td file, where I think it belongs.</DIV><BR></DIV></BLOCKQUOTE><DIV><BR class="khtml-block-placeholder"></DIV>No. I am sorry, that cannot be allowed. insert_subreg must mean the same for all targets. We cannot allow x86-64 insert_subreg (and only the 32-bit variant of this, not 16-bit or 8-bit ones) to mean insert the lower 32-bit while zero-ing the upper 32-bit. The deviates from llvm philosophy.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV><BR><BLOCKQUOTE type="cite"><DIV><BLOCKQUOTE type="cite"><DIV><DIV></DIV></DIV><DIV><BLOCKQUOTE type="cite"><DIV><BLOCKQUOTE type="cite"><DIV><BLOCKQUOTE type="cite"><DIV><BLOCKQUOTE type="cite"><DIV>2: two operant variant of insert_subreg should mean the superreg is undef. If you insert a value into a low part, the rest of the superreg is still undef.</DIV></BLOCKQUOTE><DIV><BR class="khtml-block-placeholder"></DIV><DIV>I think the meaning of insert_subreg instruction (both 2 and 3 operand versions) must have semantics specific to the target. For example, on x86-64 there is no valid 3 operand insert_subreg for a 32-bit value into 64-bits, because the 32-bit result is always going to be zero extended and overwrite the upper 32-bits.</DIV></DIV></BLOCKQUOTE><DIV><BR class="khtml-block-placeholder"></DIV><DIV>It just means there is no way to implement a insert_subreg with a single instruction under x86-64. But that is perfectly ok. Apart from anyext, x86-64 just isn't going to benefit from it. It's also impossible to read or modify the higher 32-bits.</DIV></DIV></BLOCKQUOTE><DIV><BR class="khtml-block-placeholder"></DIV><DIV>Currently the move that's generated isn't handled by coalescing because the source and destination belong to different register classes. The insert_subreg is meant to be a means to move values implicitly between register classes that have a subreg relationship. So if insert_subreg semantics must be target independent, then I think you isel the zero-extending move to be:</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>(i64 (INSERT_SUBREG (i64 0), GR32:$src, 3))</DIV></DIV></BLOCKQUOTE><DIV><BR class="khtml-block-placeholder"></DIV>But that's wrong. Remember the superreg argument is an read / mod / write operand. That is, the first operand is a use, the def is the LHS but we are forcing the allocator to target the same physical register.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>v1 = some existing value</DIV><DIV>v1 = insert_subreg v1, GR32:$src, 3</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>But zext is zeroing out the top part. i.e. zext is equal to</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>mov v1, 0</DIV><DIV>v1 = insert_subreg v1, GR32:$src, 3</DIV></BLOCKQUOTE><DIV><BR class="khtml-block-placeholder"></DIV><DIV>I'm suggesting to expand the semantics of insert_subreg as described above.</DIV><BR><BLOCKQUOTE type="cite"><DIV><BLOCKQUOTE type="cite"><DIV><DIV>The thing is that the general coalescing will be able to determine that the copy from undef is unneeded for (INSERT_SUBREG (i64 undef), GR32:$src, 3), but it would take a target specific hook to know that the constant zero is unneeded on x86-64. A target specific hook for this might be useful, but I think that this is in the realm of future work now.</DIV></DIV></BLOCKQUOTE><DIV><BR class="khtml-block-placeholder"></DIV><DIV>Sorry, I am not following. zext on x86-64, i.e. the 32-bit move, cannot be coalesced away. No need for target specific hook.</DIV></DIV></BLOCKQUOTE><DIV><BR class="khtml-block-placeholder"></DIV><DIV>I simply disagree here:</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV><A href="http://www.x86-64.org/documentation/assembly.html">http://www.x86-64.org/documentation/assembly.html</A> see the section 'Implicit Zero Extend'</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>EAX = op</DIV><DIV>RAX = mov EAX <= this may be removed</DIV><DIV>... = use RAX</DIV></DIV></BLOCKQUOTE><DIV><BR class="khtml-block-placeholder"></DIV>The mov can only be removed some of the time. It's different from moves (which can always be removed if both lhs and rhs match), nor is it the same for the moves generated from lowering insert_subreg (which can be removed if rhs is a sub-register of rhs). It's a different problem that should be handled differently, it's not a register coalescing problem.</DIV><DIV><BR><BLOCKQUOTE type="cite"><DIV><DIV><BR class="khtml-block-placeholder"></DIV><BR><BLOCKQUOTE type="cite"><DIV><BLOCKQUOTE type="cite"><DIV><BLOCKQUOTE type="cite"><DIV><BLOCKQUOTE type="cite"><DIV><BLOCKQUOTE type="cite"><DIV>3: why is there a two operant variant in the first place? Why not use undef for the superreg operant?</DIV></BLOCKQUOTE><DIV><BR class="khtml-block-placeholder"></DIV><DIV>To note, the two operand variant is of the MachineInstr. The DAG form would be to represent the superregister as coming from an undef node, but this gets isel'd to the two operand MachineInstr of insert_subreg.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>The reason is that undef is typically selected to an implicit def of a register. This causes an unnecessary move to be generated later on. This move can be optimized away later with more difficulty during subreg lowering by checking whether the input register is defined by an implicit def pseudo instruction, but instead I decided to perform the optimization during ISel on the DAG form during instruction selection.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>With what you're suggesting</DIV><DIV>reg1024 = ...</DIV><DIV>reg1026 = insert_subreg undef, reg1024, 1</DIV><DIV>reg1027 = insert_subreg reg1026, reg1025, 1</DIV><DIV>use reg1027</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>would be isel'd to then subreg lowered to:</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>R6 = ...</DIV><DIV>implicit def R01 <= this implicit def is unecessary</DIV></DIV></BLOCKQUOTE><DIV><BR class="khtml-block-placeholder"></DIV>That's a pseudo instruction, it doesn't cost anything.</DIV><DIV><BR><BLOCKQUOTE type="cite"><DIV><DIV>R23 = R01 <= this copy is unnecessary </DIV></DIV></BLOCKQUOTE><DIV><BR class="khtml-block-placeholder"></DIV>It can be coalesced to:</DIV><DIV>R23 = undef</DIV><DIV><BR><BLOCKQUOTE type="cite"><DIV><DIV>R2 = R6</DIV><DIV>R45 = R23</DIV><DIV>R5 = R6</DIV><DIV>use R45</DIV></DIV></BLOCKQUOTE><DIV><BR class="khtml-block-placeholder"></DIV>Using undef explicit is the right way to go. There is a good reason it's there. Having the two operand version of insert_subreg that implicitly use an undef value doesn't fit into the overall llvm philosophy.<BR></DIV></BLOCKQUOTE><DIV><BR class="khtml-block-placeholder"></DIV><DIV>Right now the coalescing that you are describing is happening during isel. Are you simply saying that you'd rather have the coalescing happen during subreg lowering? I can accept that, but would you share your reasons?</DIV></DIV></BLOCKQUOTE><DIV><BR class="khtml-block-placeholder"></DIV>There really isn't a very good argument for having the 2 different versions of insert_subreg. undef use must be explicitly modeled. I really don't see what you mean by coalescing during isel. isel doesn't have the concept of coalescing. Also don't forget everything must remain ssa until register allocation.<BR></DIV></BLOCKQUOTE><DIV><BR class="khtml-block-placeholder"></DIV><DIV>It depends on what you intend the semantics of insert_subreg to be. Under my proposal above, there would only be the 3 operand version. However, there would be a variant where where the input superreg operand is an immediate, but the immediate value would then be explicit.</DIV><BR></DIV></BLOCKQUOTE><DIV><BR class="khtml-block-placeholder"></DIV>That doesn't work. Remember insert_subreg is modeled as a read / mod / write instruction. If the superreg is a immediatate, what does that mean? Then it's no longer a insert_subreg. Please keep it simple, don't attempt to overload unnecessarily.</DIV><DIV><BR><BLOCKQUOTE type="cite"><DIV><BLOCKQUOTE type="cite"><DIV><BLOCKQUOTE type="cite"><DIV><BLOCKQUOTE type="cite"><DIV><BLOCKQUOTE type="cite"><DIV><BLOCKQUOTE type="cite"><DIV>4: what's the benefit of isel a zext to insert_subreg and then xform it to a 32-bit move? <BR></DIV></BLOCKQUOTE><DIV><BR class="khtml-block-placeholder"></DIV><DIV>The xform to a 32-bit move is only the conservative behavior. The zext can be implicit if regalloc can coalesce subreg_inserts.</DIV><BR><BLOCKQUOTE type="cite"><DIV>Why not just isel the zext to the move? It's not legal to coalesce it away anyway.</DIV></BLOCKQUOTE><DIV><BR class="khtml-block-placeholder"></DIV><DIV>Actually it is legal to coalesce it. On x86-64 any write to a 32-bit register zero extends the value to 64-bits. For the insert_subreg under discussion the inserted value is a 32-bit result, that has in-fact already be zero extended implicitly.</DIV></DIV></BLOCKQUOTE><DIV><BR class="khtml-block-placeholder"></DIV>It's not legal to coalesce away the 32-bit zero extending move.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>Suppose RAX contains some value with top 32-bits non-zero.</DIV><DIV>mov EAX, EAX (zero extend top bits)</DIV><DIV>use RAX (expecting top bits to be zero)</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>Coalesced away the move is a miscompilation.</DIV></BLOCKQUOTE><DIV><BR class="khtml-block-placeholder"></DIV><DIV>Indeed, but what you have described is not a valid insert_subreg either. Insert_subreg would take EAX as its input operand and would only be coalesced into an instruction that defines EAX explicitly (i.e. an instruction that defines RAX defines EAX implicitly, not explicitly so no coalescing). I think that this coalescing rule is generally required for correctness when coalescing insert_subreg under any architecture.</DIV></DIV></BLOCKQUOTE><DIV><BR class="khtml-block-placeholder"></DIV>What I've been saying all along. zero_extend on x86-64 isn't the same as a insert_sub, don't try to model it that way.</DIV></BLOCKQUOTE></DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>Your example is (use (zext (i32 (trunc RAX)))), which cannot be done without an explicit mov instruction. What I was contending is that (use (zext EAX)) can be done without an explicit mov instruction.</DIV><BR></BLOCKQUOTE><DIV><BR class="khtml-block-placeholder"></DIV>I appreciate you're trying to think of ways to expand the use of subreg work. But x86-64 implicit zero-extension is not the same problem. Trying to solve the x86-64 optimization issue this way is a unacceptable hack. Your subreg pass is a general pass. Please keep it that way. </DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>Thanks,</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>Evan</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV><BLOCKQUOTE type="cite"><DIV> <SPAN class="Apple-style-span" style="border-collapse: separate; border-spacing: 0px 0px; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; text-align: auto; -khtml-text-decorations-in-effect: none; text-indent: 0px; -apple-text-size-adjust: auto; text-transform: none; orphans: 2; white-space: normal; widows: 2; word-spacing: 0px; "><SPAN class="Apple-style-span" style="border-collapse: separate; border-spacing: 0px 0px; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; text-align: auto; -khtml-text-decorations-in-effect: none; text-indent: 0px; -apple-text-size-adjust: auto; text-transform: none; orphans: 2; white-space: normal; widows: 2; word-spacing: 0px; "><DIV>--</DIV><DIV>Christopher Lamb</DIV><DIV><BR class="khtml-block-placeholder"></DIV><BR class="Apple-interchange-newline"></SPAN></SPAN> </DIV><BR><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">_______________________________________________</DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">llvm-commits mailing list</DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><A href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</A></DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><A href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</A></DIV> </BLOCKQUOTE></DIV><BR></BODY></HTML>