[Libclc-dev] [PATCH] Fix vload3/vstore3 to emit only one IR load

Jeroen Ketema via Libclc-dev libclc-dev at lists.llvm.org
Fri Sep 25 14:13:50 PDT 2015


Hi Matt,

The IR below seem fishy to me: if we have

vload3(get_global_id(0), A)

then  the work item with the highest id is likely to access an element out of bounds of the array being passed in.

Also, does the store generate a store of 4 elements, or will that be precisely be 3 elements?

Jeroen

> On 25 Sep 2015, at 21:43, Matt Arsenault via Libclc-dev <libclc-dev at lists.llvm.org> wrote:
> 
> I found out from here how to finally do this correctly:
> http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20150921/301818.html
> 
> You can combine ext_vector_type and the aligned to get a load of the
> right vector type with the correct alignment. With this, the IR
> for vload3 looks like:
> 
>    define <3 x i32> @81(i32 %offset, i32* nocapture readonly %x) #0 {
>    entry:
>      %mul = mul i32 %offset, 3
>      %arrayidx = getelementptr inbounds i32, i32* %x, i32 %mul
>      %castToVec4 = bitcast i32* %arrayidx to <4 x i32>*
>      %loadVec4 = load <4 x i32>, <4 x i32>* %castToVec4, align 4
>      %extractVec = shufflevector <4 x i32> %loadVec4, <4 x i32> %undef, <3 x i32> <i32 0, i32 1, i32 2>
>      ret <3 x i32> %extractVec
>    }
> 
> The load of <4 x i32> instead of <3 x i32> is somewhat surprising to me,
> but this is much better than the previous mess from doing a load of
> the first 2 components, a separate load of the 3rd and a sequence
> to recombine them.
> 
> Old:
> 
>  define <3 x i32> @81(i32 %offset, i32* nocapture readonly %x) #0 {
>  entry:
>    %mul = mul i32 %offset, 3
>    %arrayidx = getelementptr inbounds i32, i32* %x, i32 %mul
>    %0 = bitcast i32* %arrayidx to <2 x i32>*
>    %1 = load <2 x i32>, <2 x i32>* %0, align 4, !tbaa !1
>    %2 = extractelement <2 x i32> %1, i32 0
>    %3 = insertelement <3 x i32> undef, i32 %2, i32 0
>    %4 = extractelement <2 x i32> %1, i32 1
>    %5 = insertelement <3 x i32> %3, i32 %4, i32 1
>    %add = add i32 %mul, 2
>    %arrayidx3 = getelementptr inbounds i32, i32* %x, i32 %add
>    %6 = load i32, i32* %arrayidx3, align 4, !tbaa !6
>    %7 = insertelement <3 x i32> %5, i32 %6, i32 2
>    ret <3 x i32> %7
>  }
> <0001-Fix-vload3-vstore3-to-emit-only-one-IR-load.patch>_______________________________________________
> Libclc-dev mailing list
> Libclc-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev



More information about the Libclc-dev mailing list