[Libclc-dev] [PATCH 6/9] Add optimized generic addrspace(0) vload implementation

Aaron Watry awatry at gmail.com
Wed Jul 23 13:18:48 PDT 2014

I'm thinking that the answer is inertia. The previous int load/store
was all done that way, so I kept going that way... Never mind that I
believe I was the one who wrote the original version.  I believe that
I had issues at the time doing what I've done in the attached patch
(pointer casting as you suggested).

I've abandoned patches 2-9 of this series in my local checkout and
would like to replace it with the patch I've attached.

I've run a full piglit CL test run with 0 changes in pass/fail rate,
and I'm getting sensible bitcode from the compiled CL test kernels.

Does this look better to you?


On Wed, Jul 23, 2014 at 2:12 PM, Matt Arsenault
<Matthew.Arsenault at amd.com> wrote:
> On 07/22/2014 06:46 PM, Aaron Watry wrote:
>> +define <2 x i32> @__clc_vload2_i32__addr0(i32 addrspace(0)* nocapture
>> %addr) nounwind readonly alwaysinline {
>> +  %1 = bitcast i32 addrspace(0)* %addr to <2 x i32> addrspace(0)*
>> +  %2 = load <2 x i32> addrspace(0)* %1, align 4, !tbaa !3
>> +  ret <2 x i32> %2
>> +}
> Why include the addrspace(0)s? I'm also wondering why it's necessary to
> write these in IR? Does casting the pointer type in C not do the right
> thing?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-vload-vstore-Improve-clc-implementation-to-make-asse.patch
Type: text/x-diff
Size: 13170 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/libclc-dev/attachments/20140723/b295f6f7/attachment-0002.patch>

More information about the Libclc-dev mailing list