[Libclc-dev] [PATCH 6/9] Add optimized generic addrspace(0) vload implementation
Aaron Watry
awatry at gmail.com
Wed Jul 23 13:18:48 PDT 2014
I'm thinking that the answer is inertia. The previous int load/store
was all done that way, so I kept going that way... Never mind that I
believe I was the one who wrote the original version. I believe that
I had issues at the time doing what I've done in the attached patch
(pointer casting as you suggested).
I've abandoned patches 2-9 of this series in my local checkout and
would like to replace it with the patch I've attached.
I've run a full piglit CL test run with 0 changes in pass/fail rate,
and I'm getting sensible bitcode from the compiled CL test kernels.
Does this look better to you?
--Aaron
On Wed, Jul 23, 2014 at 2:12 PM, Matt Arsenault
<Matthew.Arsenault at amd.com> wrote:
> On 07/22/2014 06:46 PM, Aaron Watry wrote:
>>
>> +define <2 x i32> @__clc_vload2_i32__addr0(i32 addrspace(0)* nocapture
>> %addr) nounwind readonly alwaysinline {
>> + %1 = bitcast i32 addrspace(0)* %addr to <2 x i32> addrspace(0)*
>> + %2 = load <2 x i32> addrspace(0)* %1, align 4, !tbaa !3
>> + ret <2 x i32> %2
>> +}
>
> Why include the addrspace(0)s? I'm also wondering why it's necessary to
> write these in IR? Does casting the pointer type in C not do the right
> thing?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-vload-vstore-Improve-clc-implementation-to-make-asse.patch
Type: text/x-diff
Size: 13170 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/libclc-dev/attachments/20140723/b295f6f7/attachment-0002.patch>
More information about the Libclc-dev
mailing list