[Libclc-dev] [PATCH] vload/vstore: Use casts instead of scalarizing everything in CLC version

Tue Aug 19 07:23:23 PDT 2014

On Mon, Aug 18, 2014 at 07:00:51PM -0500, Aaron Watry wrote:
> One final question which I've got, which I believe I know the answer
> to, but I'd rather ask than have to do an extra revision:
> 
> Do we want to do the same thing (vec2+scalar load) for vload3?
> 
> From looking at the generated bitcode for char3 vload3(), I'm seeing:
> ; Function Attrs: alwaysinline nounwind readonly
> define <3 x i8> @_Z6vload3jPKc(i32 %offset, i8* nocapture readonly %x) #0 {
>   %1 = mul i32 %offset, 3
>   %2 = getelementptr inbounds i8* %x, i32 %1
>   %3 = bitcast i8* %2 to <4 x i8>*
>   %4 = load <4 x i8>* %3, align 4
>   %5 = shufflevector <4 x i8> %4, <4 x i8> undef, <3 x i32> <i32 0,
> i32 1, i32 2>
>   ret <3 x i8> %5
> }
> 
> In theory, vec3 takes 4 elements worth of space
> 
> If you use a vload3 which is at the end of your allocated memory
> space, would it be possible to get segfaults or VM faults since we'd
> be reading 1 element past what we've allocated?
>

I hadn't considered this before, but I think you are right that it could
be potentially unsafe to load a vec4.  So, I think it would be a good
idea to make the same change for vload.

-Tom

> E.g. Allocated 48 bytes (4 x int3, or 3 x int4), followed by:
> vload3(3, global int* input)
> 
> In theory, that should read integers input[9], input[10], and
> input[11], but it will also read and then discard int[12]...
> 
> Or am I missing something? Is the allocation required to be padded to
> prevent this from happening?  The status of 3-element vectors being
> occasionally treated as 4-element vectors for memory operations is
> "fun".
> 
> --Aaron
> 
> 
> On Fri, Aug 15, 2014 at 12:10 PM, Tom Stellard <tom at stellard.net> wrote:
> > On Fri, Aug 15, 2014 at 09:57:25AM -0700, Matt Arsenault wrote:
> >> On 08/15/2014 09:55 AM, Matt Arsenault wrote:
> >> >On 08/15/2014 09:43 AM, Tom Stellard wrote:
> >> >>I don't think it's possible to implement a single store version
> >> >>of vec3 using
> >> >>OpenCL C, because if you cast a pointer as a vec3 type, clang will will
> >> >>try to store a vec4 value to it, because sizeof(vec3) == sizeof(vec4) in
> >> >>memory.
> >> >>
> >> >>-Tom
> >> >I guess it would be appropriate to write this one directly in IR then
> >> Actually that kind of seems like a clang bug. It should be emitting
> >> the load on the actual type with whatever alignment, not the type
> >> the rounded size happens to be
> >
> > What clang is doing is legal as far as I can tell, it just is inefficient,
> > so I think we'll still need an IR version.
> >
> > The problem with an IR version, though, is that it has to be target
> > specific, since the address spaces are different for different targets.
> >
> > My recommendation is to commit the version I suggested which works,
> > and then we can figure out how to optimize it with IR in a follow
> > up commit.
> >
> > -Tom
> >
> > _______________________________________________
> > Libclc-dev mailing list
> > Libclc-dev at pcc.me.uk
> > http://www.pcc.me.uk/cgi-bin/mailman/listinfo/libclc-dev