[PATCH][X86] Fix for cannot select broadcast
qcolombet at apple.com
Fri Mar 21 13:25:30 PDT 2014
Talked with Nadav off-line.
In fact X86 backend has fallback patterns for such cases. We may just miss a few of them. Looking into it.
On Mar 21, 2014, at 12:49 PM, Quentin Colombet <qcolombet at apple.com> wrote:
> Hi Nadav and Owen,
> Here is a patch that fixes a cannot select failure for broadcast instruction in the X86 backend <rdar://problem/16074331>.
> Although the fix is in the x86 backend (Nadav’s domain), the root cause of the problem is in SelectionDAGISel (Owen’s domain).
> Thanks for your review and feedbacks.
> ** Symptom **
> Compiling the test case in the patch with x86_64 and avx2 would produce:
> LLVM ERROR: Cannot select: 0x7f9d02059128: v8i16 = X86ISD::VBROADCAST 0x7f9d02037178 [ORD=9] [ID=13]
> 0x7f9d02037178: i16,ch = load 0x7f9d0205ab18, 0x7f9d02059230, 0x7f9d02036b48<LD2[%cV_R.addr](align=4)> [ORD=7] [ID=11]
> 0x7f9d02059230: i64,ch = CopyFromReg 0x7f9d01512a30, 0x7f9d02036a40 [ORD=1] [ID=8]
> 0x7f9d02036a40: i64 = Register %vreg0 [ID=1]
> 0x7f9d02036b48: i64 = undef [ID=3]
> In function: isel_crash_broadcast
> FileCheck error: '-' is empty.
> ** Problem **
> The instruction supposed to match is a broadcast of a value coming from the memory, i.e., a load folded into a broadcast.
> The load is used only once and the broadcast does not have any other input dependencies. Thus, the folding is possible modulo a proper scheduling of the broadcast node.
> The problem here is that during the select phase of the isel process, SelectionDAGISel performs a check to see if the load is foldable into the broadcast (HandleMergeInputChains). This test is overly conservative and fails to detect that the folding is okay.
> ** Cause **
> HandleMergeInputChains is not actually checking that a cycle will be created if something is folded into something else.
> In particular, it is not checking for reachability but instead relies on heuristics to give a quick answer. This answer is conservatively correct:
> - No: no cycle will be created.
> - Yes: a cycle *may* be created.
> In this example we have something like this:
> a = ld @a
> | \
> | st b, @b
> C = vbroadcast a
> Here, we are trying to fold a into C and there is a chain between the load of a and the store of b.
> Since this chain is not part of the pattern it assumes it will create a cycle:
> st b, @b
> C = vbroadcast ld @b
> This is wrong, because the chain is in one direction C -> st b, i.e., we can schedule C before st b.
> I beleive this limitation is intended for three reasons:
> 1. to avoid costly reachability checks.
> 2. to handle only the rewriting of token factor from the not-yet-matched nodes to the matched node.
> 3. everything that has been matched is ready to schedule.
> Interestingly, inverting the selection order of the st and vbroadcast nodes in this dag, solves the issue, because now the store is considered as scheduled and therefore cannot create a cycle (condition #3).
> ** Proposed Solution **
> Fixing the root cause of the problem requires to change a lot of assumption in SelectionDAGISel, in particular #3. Moreover, #1 may be very harmful for the compile time.
> Therefore, the proposed fix works around the problem by forcing a valid schedule for the broadcast instruction to please the select phase. In parallel, I will file a PR for the general issue.
> Other ideas are welcome!
> For the details, the patch is transforming this:
> load -- chain --> someNode -- more deps ->
> +-- use --> broadcast -- more deps ->
> Into this:
> load +-- chain --> someNode -- more deps ->
> \ /
> +-- use --> broadcast -- more deps ->
> Thus, the broadcast will be schedule in place of the load. By construction, this is valid, because load has only one use and broadcast only one input dependency (the load).
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-commits