<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><br><div><div>On Sep 29, 2012, at 11:37 AM, Evan Cheng <<a href="mailto:evan.cheng@apple.com">evan.cheng@apple.com</a>> wrote:</div><br><blockquote type="cite"><meta http-equiv="Content-Type" content="text/html charset=iso-8859-1"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div><blockquote type="cite"><div style="background-color: rgb(255, 255, 255); font-family: arial, helvetica, sans-serif; font-size: 12pt; position: static; z-index: auto; "><div class="yui_3_2_0_19_1348903864540101" style="font-size: 16px; font-family: arial, helvetica, sans-serif; background-color: transparent; font-style: normal; "><br></div><div class="yui_3_2_0_19_1348903864540103" style="font-size: 16px; font-family: arial, helvetica, sans-serif; background-color: transparent; font-style: normal; ">2.

 The BURR scheduler on x86-32 appears to set all latencies to one (which

 makes it a pure RR scheduler with no ILP), while the ILP scheduler on 

x86-64 appears to set all latencies to 10 expect for a few long-latency 

instructions. For the

 sake of documenting this in the paper, does anyone know (or can point 

me to) a precise description of how the scheduler sets latency values? 

In the revised paper, I will add experimental results based on precise 

latency values (see the attached spreadsheet) and would like to clearly 

document how LLVM's rough latencies for x86 are determined.<br></div></div></blockquote><div><br></div>I don't think your information is correct. The ILP scheduler is not setting the latencies to 10. LLVM does not have machine models for x86 (except for atom) so it's using a uniform latency model (one cycle).</div></div></blockquote><div><br></div><div>Evan's description is precise. Everything is one cycle, unless it is 10 cycles ;) But it's easy to reconfigure to use itineraries, as I guess you've done.</div><br><blockquote type="cite"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div><blockquote type="cite"><div style="background-color: rgb(255, 255, 255); font-family: arial, helvetica, sans-serif; font-size: 12pt; position: static; z-index: auto; "><div class="yui_3_2_0_19_1348903864540105" style="font-size: 16px; font-family: arial, helvetica, sans-serif; background-color: transparent; font-style: normal; "><br></div><div class="yui_3_2_0_19_1348903864540107" style="font-size: 16px; font-family: arial, helvetica, sans-serif; background-color: transparent; font-style: normal; ">3.

 Was the choice to use rough latency values in the ILP scheduler based 

on the fact that using precise latencies makes it much harder for a 

heuristic non-backtracking scheduler to balance ILP and RP or the choice

 was made simply because nobody bothered to write an x86 itinerary?  </div></div></blockquote></div></div></blockquote><br><blockquote type="cite"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div><blockquote type="cite"><div style="background-color: rgb(255, 255, 255); font-family: arial, helvetica, sans-serif; font-size: 12pt; position: relative; z-index: 0; "></div></blockquote>No one has bothered to write the itinerary.</div></div></blockquote><div><br></div><div>I recently committed infrastructure that allows machine models to be developed incrementally, at the level of detail appropriate for the processor. I have a feeling we will start to see models begin to evolve for x86 processors very soon. The framework is documented in TargetSchedule.td and you're welcome to contribute.</div><div><br></div><div>The only feature that I still plan to implement is the ability of a machine model to specify that it is derived from another. It will be trivial to add though when the time comes.</div><br><blockquote type="cite"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div><blockquote type="cite"><div style="background-color: rgb(255, 255, 255); font-family: arial, helvetica, sans-serif; font-size: 12pt; position: static; z-index: auto; "><div class="yui_3_2_0_19_1348903864540111" style="font-size: 16px; font-family: arial, helvetica, sans-serif; background-color: transparent; font-style: normal; ">4.

 Does the ILP scheduler ever consider scheduling a stall (leaving a 

cycle empty) when there are ready instructions? Here is a small hypothetical example that explains what I mean:</div><div class="yui_3_2_0_19_1348903864540113" style="font-size: 16px; font-family: arial, helvetica, sans-serif; background-color: transparent; font-style: normal; "><br>Suppose

 that at Cycle C the register pressure (RP) is equal to the physical 

limit and all ready instructions in that cycle start new live ranges, 

thus increasing the RP above the physical register limit. However, in a 

later cycle C+Delta some instruction X that closes a currently open live

 range will become ready. If the objective is minimizing RP, the right

 choice to make in this case is leaving Cycles C through C+Delta-1 empty

 and scheduling Instruction X in Cycle C+Delta. Otherwise, we will be 

increasing the RP. Does the ILP scheduler ever make such a choice or it will 

always schedule an instruction when the ready list is not empty?<br></div></div></blockquote></div></div></blockquote></div><br><div>The standard ILP scheduler does not have a "ReadyFilter" so instructions are inserted in the ready queue the moment their predecessors are scheduled. So, yes, it will effectively "impose stalls" to reduce register pressure. Note that things work differently with an itinerary though. And the answer will depend on how you've written the itinerary.</div><div><br></div><div>-Andy</div></body></html>