[PATCH] Teach the DAGCombiner how to fold a OR of two shufflevector into a single shufflevector node

Nadav Rotem nrotem at apple.com
Wed Mar 5 13:31:08 PST 2014

>> And that having been said, maybe the specifics of what you're doing could
>> be a useful canonicalization -- you'd have to provide additional details.
> It's basically the function "CollectShuffleElements" in
> lib/Transform/InstCombine/InstCombineVectorOps.cpp. Its job appears to
> be to hunt backwards from an insertelement/extractlement pair and
> construct a shuffle by any means necessary. I think it can produce
> arbitrary shuffles at the moment, provided the type doesn't change
> half-way through[1].

It is okay to generate new shuffles from a collection of insert/extracts because the new shuffle instructions can’t be worse then a collection of inserts and extracts.  We already have code in SelectionDAG for turning BUILD_VECTOR into shuffle nodes for the same reason. 

In the loop vectorizer we are only introducing two kinds of shuffles: broadcasts and reverse. We have specific target hooks in TTI for estimating their costs. 

> I'm trying to extend this so that the eventual type *can* be different
> from the inputs (to avoid "(scalar_to_vector (extract_vector_elt
> ...))" sequences in the backends, primarily). Perhaps this makes sense
> because (de facto) the only cases considered are insert/extract, which
> are probably trying to build a vector anyway. But Nadav's comments
> gave me pause.
> I realise my initial problem could be solved with a target-specific
> DAGCombine, but if the consensus is that's the best path then the
> existing code needs a serious look because it's almost certainly too
> general as well.
> Cheers.
> Tim.
> [1]. For example (indices picked pretty much randomly and it Just
> Worked), try "opt -instcombine" on this:
> define <4 x i32> @foo(<4 x i32> %in1, <4 x i32> %in2) {
>  %e0 = extractelement <4 x i32> %in1, i32 3
>  %e1 = extractelement <4 x i32> %in1, i32 1
>  %e2 = extractelement <4 x i32> %in1, i32 3
>  %e3 = extractelement <4 x i32> %in2, i32 0
>  %vec.0 = insertelement <4 x i32> undef, i32 %e0, i32 0
>  %vec.1 = insertelement <4 x i32> %vec.0, i32 %e1, i32 1
>  %vec.2 = insertelement <4 x i32> %vec.1, i32 %e2, i32 2
>  %vec.3 = insertelement <4 x i32> %vec.2, i32 %e3, i32 3
>  ret <4 x i32> %vec.3
> }

More information about the llvm-commits mailing list