[llvm-commits] [PATCH] [InstCombine] Convoluted Shuffle Splat -> Splat

Michael Gottesman mgottesman at apple.com
Tue Oct 16 12:11:40 PDT 2012


While working on various projects I noticed that in certain situations, the fronted outputs poor code of the following form,

define void @test1(<4 x float> *%in_ptr, <4 x float> *%out_ptr) {                                                                                                             
  %A = load <4 x float>* %in_ptr, align 16
  %B = shufflevector <4 x float> %A, <4 x float> undef, <4 x i32> <i32 0, i32 0, i32 undef, i32 undef>
  %C = shufflevector <4 x float> %B, <4 x float> %A, <4 x i32> <i32 0, i32 1, i32 4, i32 undef>
  %D = shufflevector <4 x float> %C, <4 x float> %A, <4 x i32> <i32 0, i32 1, i32 2, i32 4>                                        
  store <4 x float> %D, <4 x float> *%out_ptr
  ret void
}

One would think that InstCombine would optimize this to:

define void @test1(<4 x float> *%in_ptr, <4 x float> *%out_ptr) {                                                                                                             
  %A = load <4 x float>* %in_ptr, align 16
  %D = shufflevector <4 x float> %A, <4 x float> undef, <4 x i32> zeroinitializer                                  
  store <4 x float> %D, <4 x float> *%out_ptr
  ret void
}

Sadly InstCombine does not perform the optimization due to the 4 in the shuffle masks for %C and %D. Specifically (using the case of %C without any loss of generality), InstCombine begins to create a new shuffle to merge %B and %C of the form:

	%C = shufflevector <4 x float> %A, <4 x float> %A <4 x i32> <i32 0, i32 0, i32 4, i32 undef>

It recognizes that the first two elements of the shuffle are the same (correctly) implying a splat but then looks at the 4 in the mask and thinks… the 4th element of my concatenated vector is from a different vector than the 0th element! Thus this can not be a splat, ignoring the fact that the two arguments are actually the same vector (implying a splat), and thus dropping the optimization.

This is fixed by the attached patch (with test case).

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-InstCombine-Teach-InstCombine-how-to-handle-an-obfus.patch
Type: application/octet-stream
Size: 2868 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20121016/0ec79311/attachment.obj>
-------------- next part --------------


Please review,

Michael


More information about the llvm-commits mailing list