[llvm-commits] [PATCH] [InstCombine] Convoluted Shuffle Splat -> Splat

Nadav Rotem nrotem at apple.com
Tue Oct 16 13:20:18 PDT 2012


LGTM. 


On Oct 16, 2012, at 12:11 PM, Michael Gottesman <mgottesman at apple.com> wrote:

> While working on various projects I noticed that in certain situations, the fronted outputs poor code of the following form,
> 
> define void @test1(<4 x float> *%in_ptr, <4 x float> *%out_ptr) {                                                                                                             
>  %A = load <4 x float>* %in_ptr, align 16
>  %B = shufflevector <4 x float> %A, <4 x float> undef, <4 x i32> <i32 0, i32 0, i32 undef, i32 undef>
>  %C = shufflevector <4 x float> %B, <4 x float> %A, <4 x i32> <i32 0, i32 1, i32 4, i32 undef>
>  %D = shufflevector <4 x float> %C, <4 x float> %A, <4 x i32> <i32 0, i32 1, i32 2, i32 4>                                        
>  store <4 x float> %D, <4 x float> *%out_ptr
>  ret void
> }
> 
> One would think that InstCombine would optimize this to:
> 
> define void @test1(<4 x float> *%in_ptr, <4 x float> *%out_ptr) {                                                                                                             
>  %A = load <4 x float>* %in_ptr, align 16
>  %D = shufflevector <4 x float> %A, <4 x float> undef, <4 x i32> zeroinitializer                                  
>  store <4 x float> %D, <4 x float> *%out_ptr
>  ret void
> }
> 
> Sadly InstCombine does not perform the optimization due to the 4 in the shuffle masks for %C and %D. Specifically (using the case of %C without any loss of generality), InstCombine begins to create a new shuffle to merge %B and %C of the form:
> 
> 	%C = shufflevector <4 x float> %A, <4 x float> %A <4 x i32> <i32 0, i32 0, i32 4, i32 undef>
> 
> It recognizes that the first two elements of the shuffle are the same (correctly) implying a splat but then looks at the 4 in the mask and thinks… the 4th element of my concatenated vector is from a different vector than the 0th element! Thus this can not be a splat, ignoring the fact that the two arguments are actually the same vector (implying a splat), and thus dropping the optimization.
> 
> This is fixed by the attached patch (with test case).
> 
> <0001-InstCombine-Teach-InstCombine-how-to-handle-an-obfus.patch>
> 
> Please review,
> 
> Michael
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits





More information about the llvm-commits mailing list