<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">On Jul 14, 2013, at 7:07 PM, Andrew Trick <<a href="mailto:atrick@apple.com">atrick@apple.com</a>> wrote:<br><div><blockquote type="cite"><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">The partitioning should be deterministic. It’s just that the linker output now depends on the partitioning heuristics. As long that decision is based on the input (not the host system), then it still meets Eric’s requirements. I just think it’s unfortunate that post-IPO partitioning (or more generally, parallel codegen) affects the output, but may be hard to avoid. It would be nice to be able to tune the partitioning for compile time without worrying about code quality.</div></blockquote><div dir="auto">I also want to chime in on the importance of stable binary outputs.  And not just same compiler and same sources produces same binary, but that minor changes to either should cause minor changes to the output binary.   For software updates, Apple updater tries to download only the delta to the binaries, so we want those to be as small as possible.  In addition, it often happens late in an OS release cycle that some critical bug is found and the fix is in the compiler.  To qualify it, we rebuild the whole OS with the new compiler, then compare all the binaries in the OS, making sure only things related to the bug are changed.  </div><div dir="auto"><br></div><blockquote type="cite"><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">Sorry for the tangential thought here... it seems that most of Shuxin’s proposal is actually independent of LTO, even though the prototype and primary goal is enabling LTO.</div></blockquote>This is very insightful, Andrew!   Rather than think of this (post-IPO parallelization) as an LTO enhancement, it should be that the backend simply has some threshold (e.g. number of functions) which causes it to start parallelizing the last steps.</div><div><br></div><div><br></div><div>On Jul 12, 2013, at 3:49 PM, Shuxin Yang <<a href="mailto:shuxin.llvm@gmail.com">shuxin.llvm@gmail.com</a>> wrote:<br></div><div></div><blockquote type="cite"><div>There are two camps: one camp advocate compiling partitions via multi-process,<br>the other one favor multi-thread.<br></div></blockquote><div>There is also a variant of multi-threading that is popular at Apple.  Our OSs have libdispatch which makes is easy to queue up chucks of work.  The OS looks at the overall system balance and uses the ideal number of threads to process the work queue.  </div><div><br></div><div><br></div><div><blockquote type="cite"> The compiler used to generate a single object file from the merged<br>IR, now it will generate multiple of them, one for each partition.<br></blockquote></div><div>I have not studied the MC interface, but why does each partition need to generate a separate object file?  Why can’t the first partition done create an object file, and as other partitions finish, they just append to that object file?</div><div><br></div><div>-Nick</div><div><br></div></body></html>