[LLVMdev] How to vectorize a vector type cast?
Eli Friedman
eli.friedman at gmail.com
Thu Mar 1 12:28:13 PST 2012
On Tue, Feb 28, 2012 at 2:11 PM, Gurd, Preston <preston.gurd at intel.com> wrote:
> Since Clang does not seem to allow type casts, such as uchar4 to float4,
> between vector types, it seems it is necessary to write them as element by
> element conversions, such as
>
>
>
> typedef float float4 __attribute__((ext_vector_type(4)));
>
> typedef unsigned char uchar4 __attribute__((ext_vector_type(4)));
>
>
>
> float4 to_float4(uchar4 in)
>
> {
>
> float4 out = {in.x, in.y, in.z, in.w};
>
> return out;
>
> }
I think that's right... we can represent them in IR, but I don't think
clang has a generic way to write them outside OpenCL mode. Granted,
you can use platform-specific intrinsics (_mm_cvttps_epi32 etc.).
> Running this code through “clang –c –emit-llvm” and then through “opt –O2
> –S”, produces the following IR:
>
>
>
> define <4 x float> @to_float4(i32 %in.coerce) nounwind uwtable readnone {
>
> entry:
>
> %0 = bitcast i32 %in.coerce to <4 x i8>
>
> %1 = extractelement <4 x i8> %0, i32 0
>
> %conv = uitofp i8 %1 to float
>
> %vecinit = insertelement <4 x float> undef, float %conv, i32 0
>
> %2 = extractelement <4 x i8> %0, i32 1
>
> %conv2 = uitofp i8 %2 to float
>
> %vecinit3 = insertelement <4 x float> %vecinit, float %conv2, i32 1
>
> %3 = extractelement <4 x i8> %0, i32 2
>
> %conv4 = uitofp i8 %3 to float
>
> %vecinit5 = insertelement <4 x float> %vecinit3, float %conv4, i32 2
>
> %4 = extractelement <4 x i8> %0, i32 3
>
> %conv6 = uitofp i8 %4 to float
>
> %vecinit7 = insertelement <4 x float> %vecinit5, float %conv6, i32 3
>
> ret <4 x float> %vecinit7
>
>
>
> Which does the cast as a sequence of scalar operations, whereas it could be
> done as
>
>
>
> %1 = uitofp <4 x i8> %0 to <4 x float>
>
> ret <4 x float> %1
>
>
>
> It seemed to me that the recently committed basic block vectorizer might be
> able to do this kind of optimization, but the current version does not do
> so.
Yes, that seems reasonable.
-Eli
More information about the llvm-dev
mailing list