No subject
Sun Apr 27 11:26:59 PDT 2014
define <3 x i8> @_Z6vload3jPKc(i32 %offset, i8* nocapture readonly %x) #0 {
%1 = mul i32 %offset, 3
%2 = getelementptr inbounds i8* %x, i32 %1
%3 = bitcast i8* %2 to <4 x i8>*
%4 = load <4 x i8>* %3, align 4
%5 = shufflevector <4 x i8> %4, <4 x i8> undef, <3 x i32> <i32 0,
i32 1, i32 2>
ret <3 x i8> %5
}
In theory, vec3 takes 4 elements worth of space
If you use a vload3 which is at the end of your allocated memory
space, would it be possible to get segfaults or VM faults since we'd
be reading 1 element past what we've allocated?
E.g. Allocated 48 bytes (4 x int3, or 3 x int4), followed by:
vload3(3, global int* input)
In theory, that should read integers input[9], input[10], and
input[11], but it will also read and then discard int[12]...
Or am I missing something? Is the allocation required to be padded to
prevent this from happening? The status of 3-element vectors being
occasionally treated as 4-element vectors for memory operations is
"fun".
--Aaron
On Fri, Aug 15, 2014 at 12:10 PM, Tom Stellard <tom at stellard.net> wrote:
> On Fri, Aug 15, 2014 at 09:57:25AM -0700, Matt Arsenault wrote:
>> On 08/15/2014 09:55 AM, Matt Arsenault wrote:
>> >On 08/15/2014 09:43 AM, Tom Stellard wrote:
>> >>I don't think it's possible to implement a single store version
>> >>of vec3 using
>> >>OpenCL C, because if you cast a pointer as a vec3 type, clang will will
>> >>try to store a vec4 value to it, because sizeof(vec3) == sizeof(vec4) in
>> >>memory.
>> >>
>> >>-Tom
>> >I guess it would be appropriate to write this one directly in IR then
>> Actually that kind of seems like a clang bug. It should be emitting
>> the load on the actual type with whatever alignment, not the type
>> the rounded size happens to be
>
> What clang is doing is legal as far as I can tell, it just is inefficient,
> so I think we'll still need an IR version.
>
> The problem with an IR version, though, is that it has to be target
> specific, since the address spaces are different for different targets.
>
> My recommendation is to commit the version I suggested which works,
> and then we can figure out how to optimize it with IR in a follow
> up commit.
>
> -Tom
>
> _______________________________________________
> Libclc-dev mailing list
> Libclc-dev at pcc.me.uk
> http://www.pcc.me.uk/cgi-bin/mailman/listinfo/libclc-dev
More information about the Libclc-dev
mailing list