<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><br><div><div>On Jul 29, 2013, at 11:47 AM, Tim Northover <<a href="mailto:t.p.northover@gmail.com">t.p.northover@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div style="font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">Hi Mihail,<br><br><blockquote type="cite">We'd have to change how the immediate is represented.<br></blockquote><br>Ah, I didn't realise it was a truly unencoded immediate. Still, it<br>would be changing the immediate to be in line with the goals of the MC<br>layer.<br><br><blockquote type="cite">If we just parse this syntax<br>and keep the internal representation of this kind of operand we are<br>loosing the ability to differentiate between equivalent yet different<br>encodings.<br></blockquote><br>Agreed, that would be no good. For a start there would be nothing to<br>forward the information on to the encoder.<br><br><blockquote type="cite">and r0, r1, #1, #0<br>and r0, r1, #1, #2<br>...<br>and r0, r1, #1, #30<br><br>are are equivalent but have different encodings.<br></blockquote><br>The first one actually appears to be different in its behaviour w.r.t.<br>the 'C' flag, to be a pedant.<br><br><blockquote type="cite">The point is to retain the exact form that was parsed.<br></blockquote><br>I think that's a misguided goal. Each bit-pattern really has a<br>canonical textual representation, and we should be trying to print<br>that rather than any non-canonical alias someone wrote.<br><br></div></blockquote><div><br></div>+1. This is very important. The MCInst should not have any memory of what the assembly syntax form was.</div><div><br><blockquote type="cite" dir="auto"><div style="font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><blockquote type="cite">This implies that internally MC needs to be changed to hold the [value,<br>modifier] pair rather than the value itself. Isn't this uglier, more difficult and<br>more risky?<br></blockquote><br>It's the preferred MC-layer representation, as I think we've discussed<br>before. But beyond that, I think it's necessary longer-term…</div></blockquote><div><br></div><div>Changing the MC representation should be orthogonal to parsing this alternative syntax. Having the value be an encoded pair (still a single immediate operand with bitfields for the components) seems a reasonable thing to do, but it’s not required for this.</div><br><blockquote type="cite" dir="auto"><div style="font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><br><blockquote type="cite">To me it seems more natural to solve an essentially syntactical problem with<br>a asm parsing only solution.<br></blockquote><br>This problem has a disassembly layer manifestation too (which you'll<br>almost certainly meet in one of MC Hammer's other modes):<br><br>$ echo [0x00,0x01,0xb0,0xe3] | bin/llvm-mc -disassemble -arch=arm<br>       movs    r0, #0<br><br>(should be "movs r0, #0, #1" I think). I believe the closest extension<br>of your proposed solution that can handle this would be adding a<br>DecoderMethod to each instruction, which sets the MCInst's opcode to<br>your AsmParserOnly variant if it detected an odd immediate.<br><br>That would be beyond horrific and into nightmare-inducing.<br><br><blockquote type="cite">The other thing is that I have doubts about the asm parser's ability to<br>support this, even via custom parsing methods. In the end what tablegen generates is a<br>parser in name only. It's not a real parser.<br></blockquote><br>If I'm understanding you correctly, that's what comes out if you set<br>just the Name method (the isXXX and addXXX methods). The ParserMethod<br>gets called by TableGen as a short-circuit hook in normal parsing and<br>has direct access to the token stream of the assembler<br>("AsmToken::Hash, AsmToken::Integer, AsmToken::Comma, ...").<br><br><blockquote type="cite">Regarding the duplication, this is a tablegen improvement opportunity. We<br>should  be able to factor out the definitions common to all members of a<br>multiclass; but alas this doesn't work.<br></blockquote><br>That's really a side issue. We'd still be left with two Instruction<br>instances representing the same instruction, which is not a good state<br>to be in.<br></div></blockquote><div><br></div><div>It’s more than that. It's a fundamental problem. We did a lot of work back in the original MC-ization of ARM to get rid of lots of those, and it’s a non-starter to go back the other direction.</div><div><br></div><div>Tim is completely correct. This problem should be handled completely in the AsmParser, not the instruction encoder.</div><div><br></div><div>-Jim</div></div></body></html>