<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=us-ascii"><meta name=Generator content="Microsoft Word 14 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
{font-family:Courier;
panose-1:2 7 4 9 2 2 5 2 4 4;}
@font-face
{font-family:Courier;
panose-1:2 7 4 9 2 2 5 2 4 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0cm;
margin-right:0cm;
margin-bottom:0cm;
margin-left:36.0pt;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
span.E-MailFormatvorlage17
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:750783410;
mso-list-type:hybrid;
mso-list-template-ids:944423676 67567631 67567641 67567643 67567631 67567641 67567643 67567631 67567641 67567643;}
@list l0:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level2
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level3
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l0:level4
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level5
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level6
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l0:level7
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level8
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level9
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l1
{mso-list-id:1550191111;
mso-list-type:hybrid;
mso-list-template-ids:398641202 67567631 67567641 67567643 67567631 67567641 67567643 67567631 67567641 67567643;}
@list l1:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l1:level2
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l1:level3
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l1:level4
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l1:level5
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l1:level6
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l1:level7
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l1:level8
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l1:level9
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
ol
{margin-bottom:0cm;}
ul
{margin-bottom:0cm;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=DE link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Hi Evan,<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>I just read your proposal and the following discussion for VLIW support and want to share my experience of writing a VLIW back-end for LLVM.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>I would not integrate the packetizer into the register allocator super class since it would reduce the flexibility for the back-end developer to add some optimization passes after the packetizer. Instead, I would add the packetizer as a separate pass. It is true that the packetizer must deal in that case with PHI and COPY nodes that are eliminated by the RA. The packetizer can simple group all PHI and COPY instruction into single bundles consisting of only one instruction.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>From my experience a simple packetizer that groups instruction into bundles (like the old IA-64 back-end did) without changing the order of the instructions produces bad code. Instead, a VLIW scheduler that directly outputs bundles produces better code. The current LLVM scheduler (at the end of the instruction selection pass) is not suitable to generate bundled instructions since it operates on scheduling units for glued instructions.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>However, the post-RA scheduler in combination with a VLIW-aware hazard recognizer can be used before RA to bundle and schedule instructions for VLIW architectures. Only small modifications within the post-RA scheduler classes to support virtual registers are necessary.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>I also would not include packet finalization into the register allocator super class since also the following pre- and epilog code insertion (PECI) pass adds extra instruction into the instruction list. So I would add the packet finalization after pre- and epilog code insertion. Both the RA and PECI can add its instruction into single bundles that can be integrated into larger bundles within packet finalization. For packet finalization it also makes sense to perform a post-ra VLIW scheduling.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Timo<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><div><div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm'><p class=MsoNormal><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>Von:</span></b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'> llvmdev-bounces@cs.uiuc.edu [mailto:llvmdev-bounces@cs.uiuc.edu] <b>Im Auftrag von </b>Evan Cheng<br><b>Gesendet:</b> Freitag, 2. Dezember 2011 21:40<br><b>An:</b> LLVM Dev<br><b>Betreff:</b> [LLVMdev] RFC: Machine Instruction Bundle<o:p></o:p></span></p></div></div><p class=MsoNormal><o:p> </o:p></p><div><p class=MsoNormal><b><span style='font-size:9.0pt;font-family:Courier'>Machine Instruction Bundle in LLVM</span></b><span style='font-size:9.0pt;font-family:Courier'><o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>Hi all,<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>There have been quite a bit of discussions about adding machine instruction bundle to support VLIW targets. I have been pondering what the right representation should be and what kind of impact it might have on the LLVM code generator. I believe I have a fairly good plan now and would like to share with the LLVM community.<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><b><span style='font-size:9.0pt;font-family:Courier'>Design Criteria</span></b><span style='font-size:9.0pt;font-family:Courier'><o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>1. The bundle representation must be light weight. We cannot afford to add significant memory or compile time overhead.<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>2. It must be flexible enough to represent more than VLIW bundles. It should be useful to represent arbitrary sequence of instructions that must be scheduled as a unit. e.g. ARM Thumb2 IT block, Intel compare + branch macro-fusion, or random instruction sequences that are currently modeled as pseudo instructions that are expanded late.<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>3. Minimize the amount of changes required in the LLVM code generator, especially in target independent passes. It must minimize code duplication (i.e. we don't want code snippets that search for bundle start / end like all the code in the backend that skip over DBG_VALUE).<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>4. The representation should make it easy for new code to be oblivious of bundles. That is, MI passes should not have to check whether something is a bundle.<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>Given the above, we can rule out a new class (e.g. MachineInstrBundle) right away. We don't want MachineBasic block to keep a list of MachineInstrBundles since it will require massive amount of code change. So what are the choices?<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><b><span style='font-size:9.0pt;font-family:Courier'>Bundle Representation</span></b><span style='font-size:9.0pt;font-family:Courier'><o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>1. A nested MachineInstr: This is the most natural (meaning it looks most like the real HW bundle) representation. It has the nice property that most passes do not have to check if a MI is a bundle.The concern here this can add significant memory overhead if this means adding a ilist or SmallVector field to keep bundled MIs.<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>2. Add a bit to MachineInstr: The bit means the next MI in the list is part of the same bundle. This is very light weight. However it requires many passes to check wether a MI is part of a bundle.<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>The solution is a combination of both #1 and #2. Conceptually we want a representation that looks like this:<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>--------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>| Bundle | -------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>-------------- \<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | | MI |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | | MI |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | | MI |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>--------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>| Bundle | ------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>-------------- \<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | | MI |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | | MI |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | …<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>--------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>| Bundle | ------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>-------------- \<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ...<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>This is #1, a series of nested MI's. However, we are going to store the instructions in the same way as it's done right now, i.e. a list<MachineInstr> on MachineBasicBlocks. Using #2, we will add a bit to MI that indicates whether it is part of a bundle.<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | MI * | (* bit indicates next MI is "glued" to this MI, i.e. in the same bundle)<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | MI * |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | MI | (no bit, this is the end of the bundle)<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> --------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | MI * | (* a new bundle)<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | MI |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ...<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>We are going to hide the complexity in the MachineBasicBlock::iterator instead. That is, the iterator will be changed to visit only the *top level* instructions (i.e. first instruction in each bundle). We will add another iterator that allows client to visit all of the MIs for those passes that want to look into bundles.<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>We can use the same representation for arbitrary sequence of instructions that cannot be broken up. e.g. Thumb2 IT blocks.<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | MI | (just a MI)<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | MI * | (* Start of Thumb2 IT block)<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | MI * | <o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | MI | (last MI in the block)<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | MI | <o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ...<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>This representation can support VLIW (where top level MI's are all start of bundles) or non-VLIW (where there can be mix of MIs and bundles). It is also very cheap since the "Flags" field has plenty of free bits available.<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><b><span style='font-size:9.0pt;font-family:Courier'>Properties of Bundle</span></b><span style='font-size:9.0pt;font-family:Courier'><o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>If MI passes can consider each bundle as a single unit, then how are they going to examine properties (i.e. flags and operands) of a MI bundle? Conceptually a the properties of a bundle is the union of the properties of all the MIs inside the bundle. So a bundle reads all the inputs that the individual MIs read and it defines all the outputs of the individual MIs. However, this is not correct when there are intra-bundle dependencies. e.g.<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>-------------------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>| r0 = op1 r1, r2 |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>| r3 = op2 r0<kill>, #c |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>-------------------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>r0 should not be considered as a source on the bundle since it's defined inside the bundle and its live range does not extend beyond it. Instead, r0 is a clobber (i.e. dead def).<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>-------------------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>| r0 = op1 r1, r2 |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>| r3 = op2 r0, #c |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>-------------------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ...<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> = op3 r0, <o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>r0 is a def, not a use.<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>What does this mean? It means in order for passes to operate on a bundle at a time, it must be able to visit all the defs and uses of a bundle. We have established that computing the defs and uses of a bundle is not as trivial as taking the union. This is certainly not something we want to re-compute every time! This requires a slight change to the bundle representation.<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | Bundle * | (A MI with special opcode "Bundle")<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | MI * |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | MI * |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | MI | (no bit, this is the end of the bundle)<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> --------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | Bundle * | (a new bundle)<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | MI * | <o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> | MI |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ----------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> ...<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>The pseudo bundle instructions should be used to capture properties of the bundle. When a bundle is finalized the packetizer must add source and def operands to the pseudo bundle instruction. More on this later.<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>Other properties, such as mayLoad, mayStore, are static properties associated with opcodes. They cannot be copied. We will add APIs to examine properties of MIs which will do the *right thing* for bundles (i.e. look into MIs in bundles).<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><b><span style='font-size:9.0pt;font-family:Courier'>Packetizing</span></b><span style='font-size:9.0pt;font-family:Courier'><o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>The current MI flow looks like this:<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>1. DAG to MI lowering (and pre-RA schedule)<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>2. MI optimizations (LICM, CSE, etc.)<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>3. Register allocation super pass<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> 3a. De-ssa (2-address, phi slim)<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> 3b. Coalescing<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> 3c. Actual register allocation<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>4. Post-RA optimizations<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>5. PEI<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>6. Post-RA scheduling<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>In the hopefully not very distant future it should look like this:<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>1. DAG to MI lowering (no scheduling!)<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>2. MI optimizations (LICM, CSE, etc.)<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>3. Register allocation super pass<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> 3a. De-ssa (2-address, phi slim)<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> 3b. Coalescing<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> 3c. <b>Pre-RA scheduling</b><o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> 3d. Actual register allocation<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>4. Post-RA optimizations<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>5. PEI<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>6. <b>Re-schedule restores, copies</b><o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>The current proposal is for "packetization" to be done as part of the "RA super pass". Early MI optimization passes such as LICM do not benefit from operating on bundles. Furthermore, the packetizer should not have to know how to deal with copies which may later be coalesced, phi nodes, or other copy like pseudo instructions.<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>Packetization should be done in two phases. The first part decides what MIs should be bundled together and it add the "bits" which glued MIs together. This can be done either before pre-RA scheduling. The second part of the packetization should only be done after register allocation is completed. There are two very important reason for this.<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>1. Packet finalization *must* add source and def operands to the "Bundle" pseudo MI. This allows all later passes to handle they transparently. However, we do not want to do this before register allocation is complete. Otherwise it introduces new defs and uses of virtual registers and that mess up MachineRegisterInfo def-use chains.<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>e.g. Now vr0 has two defs!<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>defs: vr0<dead>, vr3, uses: vr1, vr2<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>----------------------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>| vr0 = op1 vr1, vr2 |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>| vr3 = op2 vr0<kill>, #c |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>----------------------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>2. During register allocation, more identity copies will be eliminated while loads, stores, copies, re-materialized instructions will be introduced. It makes sense for the second part of packetization to try to fill these new instructions into empty slots (for VLIW like targets).<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>So the overall flow should look like this:<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>1. DAG to MI lowering (no scheduling!)<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>2. MI optimizations (LICM, CSE, etc.)<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>3. Register allocation super pass<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> 3a. De-ssa (2-address, phi slim)<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> 3b. Coalescing<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> 3c. <b>Pre-scheduling packetization (optional)</b><o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> 3d. Pre-RA scheduling (or <b>integrated packetization</b>)<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> 3e. <b>Post-scheduling packetization (optional)</b><o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> 3f. Actual register allocation<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'> 3g. <b>Packet finalization</b><o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>4. Post-RA optimizations<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>5. PEI<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>6. Re-schedule restores, copies<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><b><span style='font-size:9.0pt;font-family:Courier'>Lowering Bundles to MCInst</span></b><span style='font-size:9.0pt;font-family:Courier'><o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>There is no need to add the equivalent of MI bundle to MCInst. A MI bundle should be concatenated into a single MCInst by storing opcodes as integer operands. e.g.<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>-------------------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>| r0 = op1 r1, r2 |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>| r3 = op2 r0, #c |<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>-------------------------<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>=><o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>MCInst: op1 r0, r1, r2, op2, r3, r0, #c<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>or<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>MCInst: op1 op2 r0, r1, r2, r3, r0, #c<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><b><span style='font-size:9.0pt;font-family:Courier'>What's Next?</span></b><span style='font-size:9.0pt;font-family:Courier'><o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>I am hoping to find some time to implement the followings in the near future:<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>1. Add BUNDLE opcode<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>2. MachineInstr class changes: new bit, changes to methods such as eraseFromParent(), isIdenticalTo().<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>3. Change MachineInstr::iterator to skip over bundled MIs. Rename old iterator.<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>4. Add MachineInstr API to check for instruction properties and switch existing code over.<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>5. Add API to form a bundle. It would compute the proper def's and use's and add MachineOperands to the bundle MI.<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>6. Switch Thumb2 IT block to using MI bundles.<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>7. Add interface for targets to register their own packetization passes.<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>I would dearly welcome help on any of these tasks especially on 4, 5, 6. I also would not cry if someone beats me to #6 (or actually any of the tasks. :-)<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>In the longer term, I would like to see a target independent packetization pass (I believe one is being reviewed). I would also like to see a target independent interface for pre-scheduling optimizations that form instruction sequences (e.g. macro-fusion). Patches welcome!<o:p></o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'><o:p> </o:p></span></p></div><div><p class=MsoNormal><span style='font-size:9.0pt;font-family:Courier'>Evan<o:p></o:p></span></p></div></div></body></html>