[Libclc-dev] [PATCH] Fix vload3/vstore3 to emit only one IR load
Jeroen Ketema via Libclc-dev
libclc-dev at lists.llvm.org
Fri Sep 25 14:13:50 PDT 2015
Hi Matt,
The IR below seem fishy to me: if we have
vload3(get_global_id(0), A)
then the work item with the highest id is likely to access an element out of bounds of the array being passed in.
Also, does the store generate a store of 4 elements, or will that be precisely be 3 elements?
Jeroen
> On 25 Sep 2015, at 21:43, Matt Arsenault via Libclc-dev <libclc-dev at lists.llvm.org> wrote:
>
> I found out from here how to finally do this correctly:
> http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20150921/301818.html
>
> You can combine ext_vector_type and the aligned to get a load of the
> right vector type with the correct alignment. With this, the IR
> for vload3 looks like:
>
> define <3 x i32> @81(i32 %offset, i32* nocapture readonly %x) #0 {
> entry:
> %mul = mul i32 %offset, 3
> %arrayidx = getelementptr inbounds i32, i32* %x, i32 %mul
> %castToVec4 = bitcast i32* %arrayidx to <4 x i32>*
> %loadVec4 = load <4 x i32>, <4 x i32>* %castToVec4, align 4
> %extractVec = shufflevector <4 x i32> %loadVec4, <4 x i32> %undef, <3 x i32> <i32 0, i32 1, i32 2>
> ret <3 x i32> %extractVec
> }
>
> The load of <4 x i32> instead of <3 x i32> is somewhat surprising to me,
> but this is much better than the previous mess from doing a load of
> the first 2 components, a separate load of the 3rd and a sequence
> to recombine them.
>
> Old:
>
> define <3 x i32> @81(i32 %offset, i32* nocapture readonly %x) #0 {
> entry:
> %mul = mul i32 %offset, 3
> %arrayidx = getelementptr inbounds i32, i32* %x, i32 %mul
> %0 = bitcast i32* %arrayidx to <2 x i32>*
> %1 = load <2 x i32>, <2 x i32>* %0, align 4, !tbaa !1
> %2 = extractelement <2 x i32> %1, i32 0
> %3 = insertelement <3 x i32> undef, i32 %2, i32 0
> %4 = extractelement <2 x i32> %1, i32 1
> %5 = insertelement <3 x i32> %3, i32 %4, i32 1
> %add = add i32 %mul, 2
> %arrayidx3 = getelementptr inbounds i32, i32* %x, i32 %add
> %6 = load i32, i32* %arrayidx3, align 4, !tbaa !6
> %7 = insertelement <3 x i32> %5, i32 %6, i32 2
> ret <3 x i32> %7
> }
> <0001-Fix-vload3-vstore3-to-emit-only-one-IR-load.patch>_______________________________________________
> Libclc-dev mailing list
> Libclc-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
More information about the Libclc-dev
mailing list