[Libclc-dev] [PATCH] vload/vstore: Use casts instead of scalarizing everything in CLC version

Tom Stellard tom at stellard.net
Fri Aug 15 09:43:14 PDT 2014


On Fri, Aug 15, 2014 at 09:36:45AM -0700, Matt Arsenault wrote:
> 
> On Aug 15, 2014, at 7:52 AM, Tom Stellard <tom at stellard.net> wrote:
> 
> > On Fri, Aug 15, 2014 at 07:45:30AM -0700, Tom Stellard wrote:
> >> On Fri, Aug 15, 2014 at 07:11:57AM -0700, Tom Stellard wrote:
> >>> On Fri, Aug 15, 2014 at 07:52:21AM -0500, Aaron Watry wrote:
> >>>> Does anyone else have feedback on this?
> >>>> 
> >>> 
> >>> Hi Aaron,
> >>> 
> >>> I've been testing this the last few days and trying to fix some vload and
> >>> vstore bugs on SI.  At this point I think the remaining bugs are in
> >>> the LLVM backend, so you can go ahead and commit this patch.
> >>> 
> >> 
> >> Maybe I spoke too soon, the code generated for vstore3 looks wrong:
> >> 
> >> ; Function Attrs: alwaysinline nounwind
> >> define void @_Z7vstore3Dv3_ijPU3AS3i(<3 x i32> %vec, i32 %offset, i32 addrspace(3)* nocapture %mem) #0 {
> >> entry:
> >>  %mul = mul i32 %offset, 3
> >>  %arrayidx = getelementptr inbounds i32 addrspace(3)* %mem, i32 %mul
> >>  %extractVec2 = shufflevector <3 x i32> %vec, <3 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 undef>
> >>  %storetmp3 = bitcast i32 addrspace(3)* %arrayidx to <4 x i32> addrspace(3)*
> >>  store <4 x i32> %extractVec2, <4 x i32> addrspace(3)* %storetmp3, align 4, !tbaa !1
> >>  ret void
> >> }
> >> 
> >> It's storing a vec4 value with the last element undef.  This would be legal
> >> if mem were declared as <3 x i32>*, since in OpenCL vec3 occupy the same
> >> amount of memory as vec4.  However, in this case, since mem is declared
> >> as i32*, I think we should only be storing three values.
> >> 
> >> I'm not sure yet if this is a bug in libclc or LLVM, but I'm looking into it.
> >> 
> > 
> > I got it to work with this implementation of vstore3:
> > 
> > 
> > typedef PRIM_TYPE##3 less_aligned_##ADDR_SPACE##PRIM_TYPE##3 __attribute__ ((aligned (sizeof(PRIM_TYPE))));\
> > _CLC_OVERLOAD _CLC_DEF void vstore3(PRIM_TYPE##3 vec, size_t offset, ADDR_SPACE PRIM_TYPE *mem) { \
> >  *((ADDR_SPACE less_aligned_##ADDR_SPACE##PRIM_TYPE##2*) (&mem[3*offset])) = (PRIM_TYPE##2)(vec.s0, vec.s1); \
> >  mem[3 * offset + 2] = vec.s2;\
> > } \
> > \
> > 
> > Which generates the following LLVM IR:
> > 
> > ; Function Attrs: alwaysinline nounwind
> > define void @_Z7vstore3Dv3_ijPU3AS1i(<3 x i32> %vec, i32 %offset, i32 addrspace(1)* nocapture %mem) #0 {
> > entry:
> >  %vecinit1 = shufflevector <3 x i32> %vec, <3 x i32> undef, <2 x i32> <i32 0, i32 1>
> >  %mul = mul i32 %offset, 3
> >  %0 = sext i32 %mul to i64
> >  %arrayidx = getelementptr inbounds i32 addrspace(1)* %mem, i64 %0
> >  %1 = bitcast i32 addrspace(1)* %arrayidx to <2 x i32> addrspace(1)*
> >  store <2 x i32> %vecinit1, <2 x i32> addrspace(1)* %1, align 4, !tbaa !2
> >  %2 = extractelement <3 x i32> %vec, i32 2
> >  %add = add i32 %mul, 2
> >  %3 = sext i32 %add to i64
> >  %arrayidx3 = getelementptr inbounds i32 addrspace(1)* %mem, i64 %3
> >  store i32 %2, i32 addrspace(1)* %arrayidx3, align 4, !tbaa !7
> >  ret void
> > }
> > 
> > Does this look correct?
> > 
> > -Tom
> 
> It looks correct, but it really should be a single store with the align 4. Does the attribute aligned thing not emit the right thing from clang?

I don't think it's possible to implement a single store version of vec3 using
OpenCL C, because if you cast a pointer as a vec3 type, clang will will
try to store a vec4 value to it, because sizeof(vec3) == sizeof(vec4) in
memory.

-Tom




More information about the Libclc-dev mailing list