[LLVMdev] Transforming wide integer computations back to vector computations

Mon Jan 2 10:21:32 PST 2012

On Jan 2, 2012, at 10:00 AM,  Duncan Sands <baldrick at free.fr> wrote:

> Hi Matt,
> 
>> It seems that one of the optimization passes (it seems to be SROA) sometimes transforms computations on vectors of ints to computations on wide integer types; for example, I'm seeing code like the following after optimizations(*):
>> 
>>   %0 = bitcast<16 x i8>  %float2uint to i128
>>   %1 = shl i128 %0, 8
>>   %ins = or i128 %1, 255
>>   %2 = bitcast i128 %ins to<16 x i8>
> 
> this would probably be better expressed as a vector shuffle.  What's the
> testcase?

The bitcode below, then run through "opt -scalarrepl-ssa", shows the behavior.  The original computation was setting a small array of i8s to 0xff, then storing a vector value to elements 2-10 of the array, then loading elements 1-9 of the array and storing them into the %RET pointer.  After optimization it had eliminated the array (and the load/store to/from it) entirely, and directly computes the combination of 0xff in the low element of the vector and then a shifted version of the original value to store in %RET.

Thanks,
-matt

target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
target triple = "x86_64-apple-darwin11.2.0"

define void @f_fu(float* nocapture %RET, float* nocapture %aFOO, float %b) nounwind {
for_exit:
  %x = alloca i64, align 16
  %tmpcast = bitcast i64* %x to [8 x i8]*
  store i64 -1, i64* %x, align 16
  %ptr_cast_for_load = bitcast float* %aFOO to <4 x i32>*
  %masked_load202 = load <4 x i32>* %ptr_cast_for_load, align 4
  %gather_bitcast = bitcast <4 x i32> %masked_load202 to <4 x float>
  %float2uint = fptoui <4 x float> %gather_bitcast to <4 x i8>
  %ptr190 = getelementptr [8 x i8]* %tmpcast, i64 0, i64 2
  %ptrcast = bitcast i8* %ptr190 to <4 x i8>*
  store <4 x i8> %float2uint, <4 x i8>* %ptrcast, align 2
  %ptr194 = getelementptr [8 x i8]* %tmpcast, i64 0, i64 1
  %ptr_cast_for_load203 = bitcast i8* %ptr194 to <4 x i8>*
  %masked_load195204 = load <4 x i8>* %ptr_cast_for_load203, align 1
  %uint2float = uitofp <4 x i8> %masked_load195204 to <4 x float>
  %value2int = bitcast <4 x float> %uint2float to <4 x i32>
  %ptrcast200 = bitcast float* %RET to <4 x i32>*
  store <4 x i32> %value2int, <4 x i32>* %ptrcast200, align 4
  ret void
}

> Ciao, Duncan.
> 
>> 
>> The back end I'm trying to get this code to go through (a hacked up version of the LLVM C backend(**)) doesn't support wide integer types, but is fine with the original vectors of integers; I'm wondering if there's a straightforward way to avoid having these computations on wide integer types generated in the first place or if there's pre-existing code that would transform this back to use the original vector types.
> 
> 
>> 
>> Thanks,
>> -matt
>> 
>> (*) It seems that this is happening with vectors of i8 and i16, but not i32 and i64; in some cases, this is leading to better code for i8/i16 vectors, in that an unnecessary store/load round-trip being optimized out for the i8/i16 case.  I can provide a test case/submit a bug if this would be useful.
>> 
>> (**) Additional CBE patches to come from this effort, pending turning aforementioned hacks into something a little cleaner/nicer.
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120102/4ae5a3b3/attachment.html>