[Libclc-dev] Any plan for OpenCL 1.2?

Mon Jul 20 12:17:10 PDT 2020

Le lundi 20 juillet 2020, 19:52:54 CEST Jan Vesely via Libclc-dev a écrit :
> On Mon, 2020-07-20 at 09:24 -0500, Aaron Watry via Libclc-dev wrote:
> > On Sat, Jul 18, 2020, 11:53 PM DING, Yang via Libclc-dev <
> > 
> > libclc-dev at lists.llvm.org> wrote:
> > > Hi,
> > > 
> > > It seems libclc currently implements the library requirements of the
> > > OpenCL C programming language, as specified by the OpenCL 1.1
> > > Specification.
> > > 
> > > I am wondering if there is any active development or plan to upgrade
> > > it to OpenCL 1.2? If not, what are the biggest challenges?
> > 
> > I haven't checked in a while, but I think the biggest blocker at this
> > point
> > is that we still don't have a printf implementation in libclc.  Most/all
> > of
> > the rest of the required functions are already implemented to expose 1.2.
> > 
> > I had started on a pure-C printf implementation a while back that would in
> > theory be portable to devices printing to a local/global buffer, but
> > stalled out on it when I got to printing vector arguments and hex-float
> > formats.  Also, the fact that global atomics in CL aren't guaranteed to be
> > synchronized across all work groups executing a kernel (just within a
> > given
> > workgroup for a given global buffer).
> 
> I don't think we need to worry about that. since both the amd and
> nvptx atomics are atomic for all work groups we can just use that
> behaviour. the actual atomic op would be target specific and if anyone
> wants to add an additional target they add their own implementation
> (SPIR-V can just use atomic with the right scope).
> AMD targets can be switched to use GDS as an optimization later.
> 
> at least cl 1.2 printf only prints to stdout so we only need to
> consider global memory.
> 
> > If someone wants to take a peek or keep going with it, I've uploaded my
> > WIP
> > code for the printf implementation here: https://github.com/awatry/printf
> 
> I'm not sure parsing the format string on the device is the best
> approach as it will introduce quite a lot of divergence. it might be
> easier/faster to just copy the format string and input data to the
> buffer and let the host parse/print everything.
> 
> was the plan to:
> 1.) parse the input once to get the number of bytes
> 2.) atomic move writepointer
> 3.) parse the input second time and print characters to the buffer

I'm currently doing an imlementation for mesa amd target, using what is 
already implemented on LLVM.

It means having a `__global char * __printf_alloc(uint bytes) {}` that return 
an address of a global buffer.
The adress is calculated from a global buffer adress + an offset of what have 
aready been stored.

Mine is not using atomic yet, since I'm working on the buffer runtime 
management on clover for the moment and will finish libclc later.

Serge

> 
> or did you have anything more specialized in mind?
> 
> thanks,
> Jan
> 
> > It's probably horrible, and may have to be re-written from scratch to
> > actually work on a GPU, but it may be a start :)
> > 
> > Thanks,
> > Aaron
> > 
> > > Thanks,
> > > Yang
> > > _______________________________________________
> > > Libclc-dev mailing list
> > > Libclc-dev at lists.llvm.org
> > > https://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
> > 
> > _______________________________________________
> > Libclc-dev mailing list
> > Libclc-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
> 
> _______________________________________________
> Libclc-dev mailing list
> Libclc-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev