[PATCH] Improve DAG combine pass on certain IR vector patterns
Mehdi Amini
mehdi.amini at apple.com
Fri Jan 16 09:50:13 PST 2015
Hi Fiona,
> On Jan 16, 2015, at 7:56 AM, Fiona Glaser <fglaser at apple.com> wrote:
>
> Loading 2 2x32-bit float vectors into the bottom half of a 256-bit vector
> produced suboptimal code in AVX2 mode with certain IR combinations.
>
> In particular, the IR optimizer folded 2f32 + 2f32 -> 4f32, 4f32 + 4f32
> (undef) -> 8f32 into a 2f32 + 2f32 -> 8f32, which seems more canonical,
> but then mysteriously generated rather bad code; the movq/movhpd combination
> didn't match.
>
> The problem lay in the BUILD_VECTOR optimization path. The 2f32 inputs
> would get promoted to 4f32 by the type legalizer, eventually resulting
> in a BUILD_VECTOR on two 4f32 into an 8f32. The BUILD_VECTOR then, recognizing
> these were both half the output size, concatted them and then produced
> a shuffle. However, the resulting concat + shuffle was more complex than
> it should be; in the case where the upper half of the output is undef, we
> probably want to generate shuffle + concat instead.
>
Are we sure that BUILD_VECTOR is the only place that can lead to your pathological case?
I feel such case may be present because of other combine/fold where you end up with concat_vectors followed by a shuffle with an undef instead of a shuffle followed by a concat with undef.
It seems to me that this canonicalization belongs to visitVECTOR_SHUFFLE instead.
Some comments on the patch itself:
+ bool ShuffleUpperUndef = true;
+ for (unsigned i = NumInScalars / 2; i != NumInScalars; ++i)
+ if (Mask[i] >= 0)
+ ShuffleUpperUndef = false;
bool ShuffleUpperUndef = std::all_of(&Mask[NumInScalars / 2], &Mask[NumInScalars], [](int i){ return i == 0; });
+ for(unsigned i = 0; i < NumInScalars / 2; i++)
+ Mask.pop_back();
Mask.resize(NumInScalars / 2);
+ return DAG.getNode(ISD::CONCAT_VECTORS, dl, VT, VecIn1, VecIn2);
+ }
+ else {
+ VecIn1 = DAG.getNode(ISD::CONCAT_VECTORS, dl, VT, VecIn1, VecIn2);
+ VecIn2 = SDValue(nullptr, 0);
+ }
The else is useless since the true branch ends up with a return.
+; CHECK: foo
use CHECK-LABEL
Mehdi
> This enhancement results in the optimizer correctly producing the optimal
> movq + movhpd sequence for all three variations on this IR, even with AVX2.
>
> I've included a test case.
>
> Radar link: rdar://problem/13326338 <rdar://problem/13326338>
>
> <patch.diff>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150116/d4ec1fd9/attachment.html>
More information about the llvm-commits
mailing list