[Libclc-dev] Any plan for OpenCL 1.2?

Thu Jul 23 02:58:24 PDT 2020

Hello

I've created https://reviews.llvm.org/D84392 to add printf to amd target

Serge

Le mardi 21 juillet 2020, 03:31:17 CEST Aaron Watry via Libclc-dev a écrit :
> On Mon, Jul 20, 2020 at 8:26 PM Aaron Watry <awatry at gmail.com> wrote:
> > On Mon, Jul 20, 2020 at 12:52 PM Jan Vesely <jan.vesely at rutgers.edu> 
wrote:
> > > On Mon, 2020-07-20 at 09:24 -0500, Aaron Watry via Libclc-dev wrote:
> > > > On Sat, Jul 18, 2020, 11:53 PM DING, Yang via Libclc-dev <
> > > > 
> > > > libclc-dev at lists.llvm.org> wrote:
> > > > > Hi,
> > > > > 
> > > > > It seems libclc currently implements the library requirements of the
> > > > > OpenCL C programming language, as specified by the OpenCL 1.1
> > > > > Specification.
> > > > > 
> > > > > I am wondering if there is any active development or plan to upgrade
> > > > > it to OpenCL 1.2? If not, what are the biggest challenges?
> > > > 
> > > > I haven't checked in a while, but I think the biggest blocker at this
> > > > point
> > > > is that we still don't have a printf implementation in libclc. 
> > > > Most/all of
> > > > the rest of the required functions are already implemented to expose
> > > > 1.2.
> > > > 
> > > > I had started on a pure-C printf implementation a while back that
> > > > would in
> > > > theory be portable to devices printing to a local/global buffer, but
> > > > stalled out on it when I got to printing vector arguments and
> > > > hex-float
> > > > formats.  Also, the fact that global atomics in CL aren't guaranteed
> > > > to be
> > > > synchronized across all work groups executing a kernel (just within a
> > > > given
> > > > workgroup for a given global buffer).
> > > 
> > > I don't think we need to worry about that. since both the amd and
> > > nvptx atomics are atomic for all work groups we can just use that
> > > behaviour. the actual atomic op would be target specific and if anyone
> > > wants to add an additional target they add their own implementation
> > > (SPIR-V can just use atomic with the right scope).
> > > AMD targets can be switched to use GDS as an optimization later.
> > 
> > Yeah, if we go the route of what I had started (not saying we should),
> > then making it a target-specific implementation with no generic one is
> > probably the easiest route.
> > 
> > > at least cl 1.2 printf only prints to stdout so we only need to
> > > consider global memory.
> > > 
> > > > If someone wants to take a peek or keep going with it, I've uploaded
> > > > my WIP
> > > > code for the printf implementation here:
> > > > https://github.com/awatry/printf
> > > 
> > > I'm not sure parsing the format string on the device is the best
> > > approach as it will introduce quite a lot of divergence. it might be
> > > easier/faster to just copy the format string and input data to the
> > > buffer and let the host parse/print everything.
> > 
> > Yeah, I don't remember if some of my notes from when I was working on
> > this were along that line, but I know the thought crossed my head a
> > few times (and I hadn't given up on the idea at all due to the
> > performance, branchiness, and the sheer amount of code and
> > stack/register pressure that the implementation I was working on would
> > introduce).  If it weren't for the special vector output formats, we
> > could pretty much forward the print format and arguments back to the
> > host and just use the standard system printf. It might still be easier
> > to only do special handling of that format (and there might've been
> > one or two other differences from standard C printf, it's been a while
> > since I started this).
> > 
> > > was the plan to:
> > > 1.) parse the input once to get the number of bytes
> > > 2.) atomic move writepointer
> > > 3.) parse the input second time and print characters to the buffer
> > > 
> > > or did you have anything more specialized in mind?
> > 
> > The one that I was working on actually walked the print format input
> > character by character until it hit a '%' (or anything else that was
> > special) and when it came time to output anything, the idea would be
> > that we'd use an atomic increment to allocate a character in the
> > output buffer and write it. Racy, to be sure, and you'd end up with
> > output interleaved from all threads attempting to write output
> > simultaneously. A previous conversation I had indicated that the CL
> > spec doesn't guarantee that atomic operations/buffers are synchronized
> > across work groups, so that got me started down the mental path of
> > partitioning the output buffer into N segments (where N is the number
> > of work groups launched), so you could at least synchronize the output
> > amongst work groups.
> 
> Ahh, yeah, and now the rust is slowly getting polished off.  I think I
> had planned on creating the printf output in a private buffer/array
> and then at the end of the printf operation (or whenever the private
> buffer was full), flushing the built string to the global buffer
> instead of writing 1 character at a time directly to the global
> buffer.
> 
> Sorry for the rambling. I started this almost 3 years ago now, and
> haven't touched it since Oct 2017, so the memory has faded a bit in
> the interim.
> 
> --Aaron
> 
> > I will fully admit that the implementation has its issues, but from my
> > reading of the spec I think it would've at least been compliant.
> > 
> > That being said, I got a good start on a set of unit tests while
> > working on it, so it wasn't a complete waste.  If Serge is working on
> > an implementation that copies the format specs and arguments from the
> > device to mesa in order to print them on the host, I'm more than
> > willing to go with that, and I can probably port my tests over to
> > piglit at some point just for a sanity check if the CTS isn't thorough
> > enough.
> > 
> > --Aaron
> > 
> > > thanks,
> > > Jan
> > > 
> > > > It's probably horrible, and may have to be re-written from scratch to
> > > > actually work on a GPU, but it may be a start :)
> > > > 
> > > > Thanks,
> > > > Aaron
> > > > 
> > > > > Thanks,
> > > > > Yang
> > > > > _______________________________________________
> > > > > Libclc-dev mailing list
> > > > > Libclc-dev at lists.llvm.org
> > > > > https://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
> > > > 
> > > > _______________________________________________
> > > > Libclc-dev mailing list
> > > > Libclc-dev at lists.llvm.org
> > > > https://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
> 
> _______________________________________________
> Libclc-dev mailing list
> Libclc-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev