[llvm-dev] [AMDGPU] non-hsa intrinsic with hsa target

Sat Mar 5 09:46:23 PST 2016

I think the "define linkonce_odr i32 @get_global_id(i32 %dim) #5" you
dumped is llvm IR after inlining and opt.
The commit you mentioned(ba9858) doesn't change get_local_size() at all.

I never worked on OpenCL+HSA. I just wonder if libclc supports HSA.

HSA RT uses 'hsa_kernel_dispatch_packet_t' to get know workgroup size and
grid size. so far, I didn't see hsa-specific implementation appears in
libclc.

thanks,
--lx

On Sun, Mar 6, 2016 at 1:28 AM, 李弘宇 <zhenlinospirit at gmail.com> wrote:

> Hi Mr. Liu,
>
> Thanks for your quick reply.
>
> I compiled the code with the libclc_trunk and linked the bitcode file
> under $LIBCLC_DIR/built_libs/tahiti-amdgcn--.bc. After looking into the
> libclc, it is currently using the new workitem intrinsics
> (commit ba9858caa1e927a6fcc601e3466faa693835db5e). In the linked bitcode
> ($LIBCLC_DIR/built_libs/tahiti-amdgcn--.bc), it has the following code
> segment,
>
> define linkonce_odr i32 @get_global_id(i32 %dim) #5 {
> entry:
>   switch i32 %dim, label %get_local_id.exit [
>     i32 0, label %get_group_id.exit.thread
>     i32 1, label %get_group_id.exit.thread22
>     i32 2, label %get_group_id.exit.thread24
>   ]
>
> get_group_id.exit.thread:                         ; preds = %entry
>   %x.i = tail call i32 @llvm.amdgcn.workgroup.id.x() #13
>   %x.i12 = tail call i32 @llvm.r600.read.local.size.x() #3
>   %mul26 = mul i32 %x.i12, %x.i
>   %x.i4 = tail call i32 @llvm.amdgcn.workitem.id.x() #13, !range !1
>   br label %get_local_id.exit
> ...
> }
>
> So it shows that some intrinstics are still using llvm.r600.xxx. I have no
> idea if I ever missed something so that it doesn't work.
>
> Thanks.
>
> Best regards,
>
> 李弘宇 (Li, Hong-Yu)
> Department of Computer Science & Information Engineering
> National Taiwan University
>
> On Sun, Mar 6, 2016 at 12:59 AM, Liu Xin <navy.xliu at gmail.com> wrote:
>
>> Li, Hong-Yu,
>>
>> it's because get_group_id() uses get_local_size
>> _CLC_DEF size_t get_global_id(uint dim) {
>>   return get_group_id(dim)*get_local_size(dim) + get_local_id(dim);
>> }
>>
>> in libclc/amdgcn,  'get_local_size' invokes r600-xxx intrinsics.  I
>>  doubt that libclc ever supports hsa-runtime before.
>>
>>
>> thanks,
>> --lx
>>
>>
>> On Sun, Mar 6, 2016 at 12:11 AM, 李弘宇 via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Dear Developers,
>>>
>>> I compiled a OpenCL kernel before (on Nov. last year) like
>>>
>>> __kernel void g(__global float* array)
>>> {
>>>   array[get_global_id(0)] = 1;
>>> }
>>>
>>> with libclc, which would originally use the instrinsics like
>>> llvm.r600.read.local.size.x().
>>>
>>> I executed the generated object file with one version of the hsa-runtime
>>> [1] provided by Mr. Stellard, when there was more than one workgroup, the
>>> output of the program wasn't correct at that time. I guessed this might be
>>> because get_group_id() always returned 1 (not quite sure what was going on
>>> at that time).
>>>
>>> When I compile such cases using current llvm trunk, it uses a set of
>>> instrinsics starting with llvm.amdgcn, while it still
>>> uses llvm.r600.read.local.size.x(). The output LLVM IR code is like:
>>>
>>> define void @g(float addrspace(1)* nocapture %array) #0 {
>>>   %x.i.i = tail call i32 @llvm.amdgcn.workgroup.id.x() #2
>>>   %x.i12.i = tail call i32 @llvm.r600.read.local.size.x() #1
>>>   %mul26.i = mul i32 %x.i12.i, %x.i.i
>>>   %x.i4.i = tail call i32 @llvm.amdgcn.workitem.id.x() #2, !range !7
>>>   %add.i = add i32 %x.i4.i, %mul26.i
>>>   %0 = sext i32 %add.i to i64
>>>   %arrayidx = getelementptr inbounds float, float addrspace(1)* %array,
>>> i64 %0
>>>   store float 1.000000e+00, float addrspace(1)* %arrayidx, align 4,
>>> !tbaa !8
>>>   ret void
>>> }
>>>
>>> which cannot be handled by llc with the message "the non-hsa instrinsic
>>> with hsa target shown".
>>>
>>> After looking into the log (r259297), my question is that is there
>>> other intrinsic that support this case when the target is amdgcn--amdhsa?
>>> In the log of r259297, it states that AMDGPUPromoteAlloca pass (a
>>> backend pass) will generate this intrinsic, but even when I just emit-llvm
>>> without going through llc, this intrinsic is still emitted.
>>>
>>> [1] https://github.com/tstellarAMD/hsa-runtime
>>>
>>>
>>> Regards,
>>>
>>> 李弘宇 (Li, Hong-Yu)
>>> Department of Computer Science & Information Engineering
>>> National Taiwan University
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160306/8ba213ef/attachment.html>