[PATCH] Improve DAG combine pass on certain IR vector patterns

Fri Jan 16 09:50:13 PST 2015

Hi Fiona,

> On Jan 16, 2015, at 7:56 AM, Fiona Glaser <fglaser at apple.com> wrote:
> 
> Loading 2 2x32-bit float vectors into the bottom half of a 256-bit vector
> produced suboptimal code in AVX2 mode with certain IR combinations.
> 
> In particular, the IR optimizer folded 2f32 + 2f32 -> 4f32, 4f32 + 4f32
> (undef) -> 8f32 into a 2f32 + 2f32 -> 8f32, which seems more canonical,
> but then mysteriously generated rather bad code; the movq/movhpd combination
> didn't match.
> 
> The problem lay in the BUILD_VECTOR optimization path. The 2f32 inputs
> would get promoted to 4f32 by the type legalizer, eventually resulting
> in a BUILD_VECTOR on two 4f32 into an 8f32. The BUILD_VECTOR then, recognizing
> these were both half the output size, concatted them and then produced
> a shuffle. However, the resulting concat + shuffle was more complex than
> it should be; in the case where the upper half of the output is undef, we
> probably want to generate shuffle + concat instead.
> 

Are we sure that BUILD_VECTOR is the only place that can lead to your pathological case? 
I feel such case may be present because of other combine/fold where you end up with concat_vectors followed by a shuffle with an undef instead of a shuffle followed by a concat with undef.

It seems to me that this canonicalization belongs to visitVECTOR_SHUFFLE instead.

Some comments on the patch itself:

+          bool ShuffleUpperUndef = true;
+          for (unsigned i = NumInScalars / 2; i != NumInScalars; ++i)
+            if (Mask[i] >= 0)
+              ShuffleUpperUndef = false;

bool ShuffleUpperUndef = std::all_of(&Mask[NumInScalars / 2], &Mask[NumInScalars],  [](int i){ return i == 0; });

+            for(unsigned i = 0; i < NumInScalars / 2; i++)
+              Mask.pop_back();

Mask.resize(NumInScalars / 2);

+            return DAG.getNode(ISD::CONCAT_VECTORS, dl, VT, VecIn1, VecIn2);
+          }
+          else {
+            VecIn1 = DAG.getNode(ISD::CONCAT_VECTORS, dl, VT, VecIn1, VecIn2);
+            VecIn2 = SDValue(nullptr, 0);
+          }

The else is useless since the true branch ends up with a return.

+; CHECK: foo

use CHECK-LABEL

Mehdi

> This enhancement results in the optimizer correctly producing the optimal
> movq + movhpd sequence for all three variations on this IR, even with AVX2.
> 
> I've included a test case.
> 
> Radar link: rdar://problem/13326338 <rdar://problem/13326338>
> 
> <patch.diff>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150116/d4ec1fd9/attachment.html>