<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div>On Jan 2, 2012, at 10:00 AM,  Duncan Sands <<a href="mailto:baldrick@free.fr">baldrick@free.fr</a>> wrote:</div><div><br></div><div><blockquote type="cite"><span class="Apple-style-span" style="border-collapse: separate; font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; ">Hi Matt,<br><br><blockquote type="cite">It seems that one of the optimization passes (it seems to be SROA) sometimes transforms computations on vectors of ints to computations on wide integer types; for example, I'm seeing code like the following after optimizations(*):<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">  %0 = bitcast<16 x i8>  %float2uint to i128<br></blockquote><blockquote type="cite">  %1 = shl i128 %0, 8<br></blockquote><blockquote type="cite">  %ins = or i128 %1, 255<br></blockquote><blockquote type="cite">  %2 = bitcast i128 %ins to<16 x i8><br></blockquote><br>this would probably be better expressed as a vector shuffle.  What's the<br>testcase?<br></span></blockquote><div><br></div><div>The bitcode below, then run through "opt -scalarrepl-ssa", shows the behavior.  The original computation was setting a small array of i8s to 0xff, then storing a vector value to elements 2-10 of the array, then loading elements 1-9 of the array and storing them into the %RET pointer.  After optimization it had eliminated the array (and the load/store to/from it) entirely, and directly computes the combination of 0xff in the low element of the vector and then a shifted version of the original value to store in %RET.</div><div><br></div><div>Thanks,</div><div>-matt</div><div><br></div><div><div>target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"</div><div>target triple = "x86_64-apple-darwin11.2.0"</div><div><br></div><div><br></div><div>define void @f_fu(float* nocapture %RET, float* nocapture %aFOO, float %b) nounwind {</div><div>for_exit:</div><div>  %x = alloca i64, align 16</div><div>  %tmpcast = bitcast i64* %x to [8 x i8]*</div><div>  store i64 -1, i64* %x, align 16</div><div>  %ptr_cast_for_load = bitcast float* %aFOO to <4 x i32>*</div><div>  %masked_load202 = load <4 x i32>* %ptr_cast_for_load, align 4</div><div>  %gather_bitcast = bitcast <4 x i32> %masked_load202 to <4 x float></div><div>  %float2uint = fptoui <4 x float> %gather_bitcast to <4 x i8></div><div>  %ptr190 = getelementptr [8 x i8]* %tmpcast, i64 0, i64 2</div><div>  %ptrcast = bitcast i8* %ptr190 to <4 x i8>*</div><div>  store <4 x i8> %float2uint, <4 x i8>* %ptrcast, align 2</div><div>  %ptr194 = getelementptr [8 x i8]* %tmpcast, i64 0, i64 1</div><div>  %ptr_cast_for_load203 = bitcast i8* %ptr194 to <4 x i8>*</div><div>  %masked_load195204 = load <4 x i8>* %ptr_cast_for_load203, align 1</div><div>  %uint2float = uitofp <4 x i8> %masked_load195204 to <4 x float></div><div>  %value2int = bitcast <4 x float> %uint2float to <4 x i32></div><div>  %ptrcast200 = bitcast float* %RET to <4 x i32>*</div><div>  store <4 x i32> %value2int, <4 x i32>* %ptrcast200, align 4</div><div>  ret void</div><div>}</div></div><div><br></div><br><blockquote type="cite">Ciao, Duncan.<br><br><blockquote type="cite"><br></blockquote><blockquote type="cite">The back end I'm trying to get this code to go through (a hacked up version of the LLVM C backend(**)) doesn't support wide integer types, but is fine with the original vectors of integers; I'm wondering if there's a straightforward way to avoid having these computations on wide integer types generated in the first place or if there's pre-existing code that would transform this back to use the original vector types.<br></blockquote><br><br><blockquote type="cite"><br></blockquote><blockquote type="cite">Thanks,<br></blockquote><blockquote type="cite">-matt<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">(*) It seems that this is happening with vectors of i8 and i16, but not i32 and i64; in some cases, this is leading to better code for i8/i16 vectors, in that an unnecessary store/load round-trip being optimized out for the i8/i16 case.  I can provide a test case/submit a bug if this would be useful.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">(**) Additional CBE patches to come from this effort, pending turning aforementioned hacks into something a little cleaner/nicer.<br></blockquote><br class="Apple-interchange-newline"></blockquote></div><br></body></html>