<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">Hi Nadav and Owen,<div><br></div><div>Here is a patch that fixes a cannot select failure for broadcast instruction in the X86 backend <<a href="rdar://problem/16074331">rdar://problem/16074331</a>>.</div><div>Although the fix is in the x86 backend (Nadav’s domain), the root cause of the problem is in SelectionDAGISel (Owen’s domain).</div><div><br></div><div>Thanks for your review and feedbacks.</div><div><br></div><div><br></div><div>** Symptom **</div><div><br></div><div>Compiling the test case in the patch with x86_64 and avx2 would produce:</div><div><div style="margin: 0px; font-size: 11px; font-family: Menlo;">--</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;">LLVM ERROR: Cannot select: 0x7f9d02059128: v8i16 = X86ISD::VBROADCAST 0x7f9d02037178 [ORD=9] [ID=13]</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;"> 0x7f9d02037178: i16,ch = load 0x7f9d0205ab18, 0x7f9d02059230, 0x7f9d02036b48<LD2[%cV_R.addr](align=4)> [ORD=7] [ID=11]</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;"> 0x7f9d02059230: i64,ch = CopyFromReg 0x7f9d01512a30, 0x7f9d02036a40 [ORD=1] [ID=8]</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;"> 0x7f9d02036a40: i64 = Register %vreg0 [ID=1]</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;"> 0x7f9d02036b48: i64 = undef [ID=3]</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;">In function: isel_crash_broadcast</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;">FileCheck error: '-' is empty.</div><div style="margin: 0px; font-size: 11px; font-family: Menlo; min-height: 13px;"><br></div><div style="margin: 0px; font-size: 11px; font-family: Menlo;">--</div></div><div><br></div><div><br></div><div>** Problem **</div><div><br></div><div>The instruction supposed to match is a broadcast of a value coming from the memory, i.e., a load folded into a broadcast.</div><div>The load is used only once and the broadcast does not have any other input dependencies. Thus, the folding is possible modulo a proper scheduling of the broadcast node.</div><div><br></div><div>The problem here is that during the select phase of the isel process, SelectionDAGISel performs a check to see if the load is foldable into the broadcast (HandleMergeInputChains). This test is overly conservative and fails to detect that the folding is okay.</div><div><br></div><div><br></div><div>** Cause **</div><div><br></div><div>HandleMergeInputChains is not actually checking that a cycle will be created if something is folded into something else.</div><div>In particular, it is not checking for reachability but instead relies on heuristics to give a quick answer. This answer is conservatively correct:</div><div>- No: no cycle will be created.</div><div>- Yes: a cycle *may* be created. </div><div><br></div><div>In this example we have something like this:</div><div><div>a = ld @a</div><div>| \</div><div>| st b, @b</div><div>C = vbroadcast a</div><div><br></div><div>Here, we are trying to fold a into C and there is a chain between the load of a and the store of b.</div><div>Since this chain is not part of the pattern it assumes it will create a cycle:</div><div> st b, @b</div><div> ^</div><div> |</div><div> v </div><div>C = vbroadcast ld @b</div><div><br></div><div>This is wrong, because the chain is in one direction C -> st b, i.e., we can schedule C before st b.</div><div>I beleive this limitation is intended for three reasons:</div><div>1. to avoid costly reachability checks.</div><div>2. to handle only the rewriting of token factor from the not-yet-matched nodes to the matched node.</div></div><div>3. everything that has been matched is ready to schedule.</div><div><br></div><div>Interestingly, inverting the selection order of the st and vbroadcast nodes in this dag, solves the issue, because now the store is considered as scheduled and therefore cannot create a cycle (condition #3).</div><div><br></div><div><br></div><div><span style="orphans: 2; text-align: -webkit-auto; widows: 2;">** Proposed Solution **</span></div><div><span style="orphans: 2; text-align: -webkit-auto; widows: 2;"><br></span></div><div><span style="orphans: 2; text-align: -webkit-auto; widows: 2;">Fixing the root cause of the problem requires to change a lot of assumption in SelectionDAGISel, in particular #3. Moreover, #1 may be very harmful for the compile time.</span></div><div><span style="orphans: 2; text-align: -webkit-auto; widows: 2;">Therefore, the proposed fix works around the problem by forcing a valid schedule for the broadcast instruction to please the select phase. In parallel, I will file a PR for the general issue.</span></div><div><span style="orphans: 2; text-align: -webkit-auto; widows: 2;"><br></span></div><div style="orphans: 2; widows: 2;">Other ideas are welcome!</div><div style="orphans: 2; widows: 2;"><br></div><div style="orphans: 2; widows: 2;">For the details, the patch is transforming this:</div><div style="orphans: 2; widows: 2;"><div>load -- chain --> someNode -- more deps -></div><div> \</div><div> +-- use --> broadcast -- more deps -></div><div><br></div><div>Into this:</div><div>load +-- chain --> someNode -- more deps -></div><div> \ /</div><div> +-- use --> broadcast -- more deps -></div><div><br></div><div>Thus, the broadcast will be schedule in place of the load. By construction, this is valid, because load has only one use and broadcast only one input dependency (the load).</div></div><div><span style="orphans: 2; text-align: -webkit-auto; widows: 2;"><br></span></div><div><span style="orphans: 2; text-align: -webkit-auto; widows: 2;">Thanks,</span></div><div><span style="orphans: 2; text-align: -webkit-auto; widows: 2;">-Quentin</span></div><div>
</div></body></html>