[llvm-dev] [AMDGPU] non-hsa intrinsic with hsa target

Liu Xin via llvm-dev llvm-dev at lists.llvm.org
Sat Mar 5 08:59:48 PST 2016


Li, Hong-Yu,

it's because get_group_id() uses get_local_size
_CLC_DEF size_t get_global_id(uint dim) {
  return get_group_id(dim)*get_local_size(dim) + get_local_id(dim);
}

in libclc/amdgcn,  'get_local_size' invokes r600-xxx intrinsics.  I  doubt
that libclc ever supports hsa-runtime before.


thanks,
--lx


On Sun, Mar 6, 2016 at 12:11 AM, 李弘宇 via llvm-dev <llvm-dev at lists.llvm.org>
wrote:

> Dear Developers,
>
> I compiled a OpenCL kernel before (on Nov. last year) like
>
> __kernel void g(__global float* array)
> {
>   array[get_global_id(0)] = 1;
> }
>
> with libclc, which would originally use the instrinsics like
> llvm.r600.read.local.size.x().
>
> I executed the generated object file with one version of the hsa-runtime
> [1] provided by Mr. Stellard, when there was more than one workgroup, the
> output of the program wasn't correct at that time. I guessed this might be
> because get_group_id() always returned 1 (not quite sure what was going on
> at that time).
>
> When I compile such cases using current llvm trunk, it uses a set of
> instrinsics starting with llvm.amdgcn, while it still
> uses llvm.r600.read.local.size.x(). The output LLVM IR code is like:
>
> define void @g(float addrspace(1)* nocapture %array) #0 {
>   %x.i.i = tail call i32 @llvm.amdgcn.workgroup.id.x() #2
>   %x.i12.i = tail call i32 @llvm.r600.read.local.size.x() #1
>   %mul26.i = mul i32 %x.i12.i, %x.i.i
>   %x.i4.i = tail call i32 @llvm.amdgcn.workitem.id.x() #2, !range !7
>   %add.i = add i32 %x.i4.i, %mul26.i
>   %0 = sext i32 %add.i to i64
>   %arrayidx = getelementptr inbounds float, float addrspace(1)* %array,
> i64 %0
>   store float 1.000000e+00, float addrspace(1)* %arrayidx, align 4, !tbaa
> !8
>   ret void
> }
>
> which cannot be handled by llc with the message "the non-hsa instrinsic
> with hsa target shown".
>
> After looking into the log (r259297), my question is that is there other
> intrinsic that support this case when the target is amdgcn--amdhsa?  In the
> log of r259297, it states that AMDGPUPromoteAlloca pass (a backend pass)
> will generate this intrinsic, but even when I just emit-llvm without going
> through llc, this intrinsic is still emitted.
>
> [1] https://github.com/tstellarAMD/hsa-runtime
>
>
> Regards,
>
> 李弘宇 (Li, Hong-Yu)
> Department of Computer Science & Information Engineering
> National Taiwan University
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160306/efd4de1b/attachment.html>


More information about the llvm-dev mailing list