No subject

Sun Apr 27 11:26:59 PDT 2014

define <3 x i8> @_Z6vload3jPKc(i32 %offset, i8* nocapture readonly %x) #0 {
  %1 = mul i32 %offset, 3
  %2 = getelementptr inbounds i8* %x, i32 %1
  %3 = bitcast i8* %2 to <4 x i8>*
  %4 = load <4 x i8>* %3, align 4
  %5 = shufflevector <4 x i8> %4, <4 x i8> undef, <3 x i32> <i32 0,
i32 1, i32 2>
  ret <3 x i8> %5

In theory, vec3 takes 4 elements worth of space

If you use a vload3 which is at the end of your allocated memory
space, would it be possible to get segfaults or VM faults since we'd
be reading 1 element past what we've allocated?

E.g. Allocated 48 bytes (4 x int3, or 3 x int4), followed by:
vload3(3, global int* input)

In theory, that should read integers input[9], input[10], and
input[11], but it will also read and then discard int[12]...

Or am I missing something? Is the allocation required to be padded to
prevent this from happening?  The status of 3-element vectors being
occasionally treated as 4-element vectors for memory operations is


On Fri, Aug 15, 2014 at 12:10 PM, Tom Stellard <tom at> wrote:
> On Fri, Aug 15, 2014 at 09:57:25AM -0700, Matt Arsenault wrote:
>> On 08/15/2014 09:55 AM, Matt Arsenault wrote:
>> >On 08/15/2014 09:43 AM, Tom Stellard wrote:
>> >>I don't think it's possible to implement a single store version
>> >>of vec3 using
>> >>OpenCL C, because if you cast a pointer as a vec3 type, clang will will
>> >>try to store a vec4 value to it, because sizeof(vec3) == sizeof(vec4) in
>> >>memory.
>> >>
>> >>-Tom
>> >I guess it would be appropriate to write this one directly in IR then
>> Actually that kind of seems like a clang bug. It should be emitting
>> the load on the actual type with whatever alignment, not the type
>> the rounded size happens to be
> What clang is doing is legal as far as I can tell, it just is inefficient,
> so I think we'll still need an IR version.
> The problem with an IR version, though, is that it has to be target
> specific, since the address spaces are different for different targets.
> My recommendation is to commit the version I suggested which works,
> and then we can figure out how to optimize it with IR in a follow
> up commit.
> -Tom
> _______________________________________________
> Libclc-dev mailing list
> Libclc-dev at

More information about the Libclc-dev mailing list