[Libclc-dev] [PATCH] Fix vload3/vstore3 to emit only one IR load
Matt Arsenault via Libclc-dev
libclc-dev at lists.llvm.org
Fri Sep 25 13:43:41 PDT 2015
I found out from here how to finally do this correctly:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20150921/301818.html
You can combine ext_vector_type and the aligned to get a load of the
right vector type with the correct alignment. With this, the IR
for vload3 looks like:
define <3 x i32> @81(i32 %offset, i32* nocapture readonly %x) #0 {
entry:
%mul = mul i32 %offset, 3
%arrayidx = getelementptr inbounds i32, i32* %x, i32 %mul
%castToVec4 = bitcast i32* %arrayidx to <4 x i32>*
%loadVec4 = load <4 x i32>, <4 x i32>* %castToVec4, align 4
%extractVec = shufflevector <4 x i32> %loadVec4, <4 x i32>
%undef, <3 x i32> <i32 0, i32 1, i32 2>
ret <3 x i32> %extractVec
}
The load of <4 x i32> instead of <3 x i32> is somewhat surprising to me,
but this is much better than the previous mess from doing a load of
the first 2 components, a separate load of the 3rd and a sequence
to recombine them.
Old:
define <3 x i32> @81(i32 %offset, i32* nocapture readonly %x) #0 {
entry:
%mul = mul i32 %offset, 3
%arrayidx = getelementptr inbounds i32, i32* %x, i32 %mul
%0 = bitcast i32* %arrayidx to <2 x i32>*
%1 = load <2 x i32>, <2 x i32>* %0, align 4, !tbaa !1
%2 = extractelement <2 x i32> %1, i32 0
%3 = insertelement <3 x i32> undef, i32 %2, i32 0
%4 = extractelement <2 x i32> %1, i32 1
%5 = insertelement <3 x i32> %3, i32 %4, i32 1
%add = add i32 %mul, 2
%arrayidx3 = getelementptr inbounds i32, i32* %x, i32 %add
%6 = load i32, i32* %arrayidx3, align 4, !tbaa !6
%7 = insertelement <3 x i32> %5, i32 %6, i32 2
ret <3 x i32> %7
}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Fix-vload3-vstore3-to-emit-only-one-IR-load.patch
Type: text/x-diff
Size: 5674 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/libclc-dev/attachments/20150925/365352ac/attachment.patch>
More information about the Libclc-dev
mailing list