[Libclc-dev] printf (Re: [PATCH 1/2] Make image builtins r600/llvm-3.9 only)

Jeroen Ketema via Libclc-dev libclc-dev at lists.llvm.org
Tue Oct 10 15:13:26 PDT 2017

> On 11 Oct 2017, at 00:00, Jan Vesely <jan.vesely at rutgers.edu> wrote:
> On Tue, 2017-10-10 at 22:58 +0200, Jeroen Ketema via Libclc-dev wrote:
>>> On 10 Oct 2017, at 20:57, Aaron Watry via Libclc-dev <libclc-dev at lists.llvm.org> wrote:
>>> On Tue, Oct 10, 2017 at 1:16 PM, Jan Vesely <jan.vesely at rutgers.edu> wrote:
>>>> On Tue, 2017-10-10 at 10:58 -0500, Aaron Watry via Libclc-dev wrote:
>>>>> On Tue, Oct 10, 2017 at 9:25 AM, Jan Vesely <jan.vesely at rutgers.edu> wrote:
>>>>>> On Mon, 2017-10-09 at 21:28 -0500, Aaron Watry via Libclc-dev wrote:
>>>>>>> On Sun, Oct 8, 2017 at 1:23 PM, Jan Vesely via Libclc-dev
>>>>>>> <libclc-dev at lists.llvm.org> wrote:
>>>>>>>> On Sun, 2017-10-01 at 14:15 -0400, Jan Vesely wrote:
>>>>>>>>> The implementation uses r600 specific intrinsics
>>>>>>>>> LLVM-4 generates calls to functions using _ro_t and _rw_t image types
>>>>>>>>> Portions of the code can be moved back as more targets/llvm versions add image support
>>>>>>>>> Signed-off-by: Jan Vesely <jan.vesely at rutgers.edu>
>>>>>>>>> ---
>>>>>>>>> AMDGPUOpenCLImageTypeLoweringPass relies on old style of kernel metadata
>>>>>>>>> passing, so I'm pretty sure this does not work in 3.9 either, but it at
>>>>>>>>> least does not generate external calls.
>>>>>>>>> I'd like to keep the code around as a reference when someone starts
>>>>>>>>> resurrecting image builtins.
>>>>>>>> ping. this is just a code cleanup. No change in functionality.
>>>>>>> Looks ok to me.  I agree it would be nice to resurrect image support
>>>>>>> in newer llvm, so lets keep this around for now.
>>>>>> thanks. Does it apply to 2/2 as well?
>>>>> Yes, it does.  I just manually tested the check_external_calls.sh
>>>>> script against both r600 and pitcairn's built libs/aliases on LLVM
>>>>> 6.0, and the image calls are now gone. I don't have a LLVM 3.9 install
>>>>> around at the moment to test my BARTS with, unfortunately. Maybe I'll
>>>>> build that release at some point and install it in its own prefix.
>>>> oh, I should have mentioned that it's in the images branch here:
>>>> https://travis-ci.org/jvesely/libclc/branches
>>>> I was considering to ask the github llvm-mirror person to enable travis
>>>> an forward email to the list, or maybe setup our own mirror with CI
>>>> enabled. or maybe add libclc to official llvm build farm.
>>>>>>> If nothing else, it might give me some ideas on how to make an
>>>>>>> implicit global buffer available for the printf implementation.
>>>>>> I don't think images are a good model for that. It'd be nicer to use
>>>>>> standard implicit parameter passing on clover side and
>>>>>> __buitltin_implicitarg_ptr on libclc side, without any llvm changes.
>>>>>> (It'd be even nicer if NDRange arguments were converted first, so they
>>>>>> are all together in one place)
>>>>> Yeah, I have been considering the implicit parameter passing that
>>>>> clover does, and had been leaning towards attempting to do that with a
>>>>>> =1MB buffer for printf output. The other thing that will probably be
>>>>> needed is an atomic counter to keep track of our current location in
>>>>> that buffer when printing.
>>>>> I'm a fair bit of the way done with having a clean-slate pure-C
>>>>> implementation of printf done that matches the CLC requirements that
>>>>> I've just been testing for identical output against my system printf.
>>>>> I'll probably post that as an RFC in its pure C form before converting
>>>>> it to CLC code just to save myself a bit of refactoring effort later
>>>>> if needed.  Once that's done, I'll take a stab at converting it to
>>>>> CLC, probably have to bug-fix clang to enable varargs function calls
>>>>> specifically for printf (CL 2.0 spec, section 6.9 part e specifies
>>>>> that variadic functions/macros are only allowed for
>>>>> printf/enqueue_kernel), which I don't believe it allows yet, and then
>>>>> I'll get the implicit parameter passing set up for clover (hopefully).
>>>> I don't think you need to implement full printf in libclc. It should be
>>>> enough to dump format string and arguments
>> There’s not even a reason to dump the format string in the buffer, as per
>> the OpenCL specification it’s known at compile time what the format
>> string is going to be.
> we could get away with a pointer to the format string. but it might be
> more difficult for clover the figure out the pointers if we want to use
> device pointers.
>>>> into the buffer and let
>>>> clover to the ugly part. printf code doesn't sound like something that
>>>> would run efficiently on GPUs anyway.
>>> I considered forwarding the format spec and arguments to clover, but I
>>> figured that we would need a custom printf implementation given that
>>> CLC's printf supports the vector length specifier (and hl length) that
>>> standard printf doesn't anyway, and that CLC doesn't support the full
>>> set of printf specifiers that normal C allows.
> agreed. it'd just be more efficient to implement custom printf in
> clover than libclc (+you can use all the C++ help)
>>> The same comment from my previous mail about enabling varargs calls in
>>> CLC for printf still would apply, at which point the real question is
>> I don’t get the argument about varargs. Contrary to the situation on a host
>> system both the compiler and printf library function are under your control
>> here. Hence, you can use the compiler to get rid of any vararg function
>> calls before they ever reach the hardware.
> varargs should be handled by clang. based on
> test/SemaOpenCL/builtin.cl, I'd expect that setting cl-1.2 should be
> enough to get it working to get rid of clang complaints (we currently
> build libclc using cl-1.0)

Oh, right. That’s what you mean. I haven’t tried compiling with -std=CL1.0,
but if you compile with -std=CL1.2, clang already generates sensible
bitcode for vararg functions.


>>> performance, and how we'd deal with sending variable length arguments
>>> back to clover in a buffer (probably just a giant byte array?).  I'm
>>> not really convinced that printf is a performance critical function.
>>> There may also be complexities involved in forwarding the printf
>>> argument list back to clover in the cases where there are more
>>> arguments supplied to the printf call than the format calls for.
>>> In this case, I figured that we at least needed a global buffer (and
>>> output size variable) that was passed implicitly by clover to libclc
>>> to dump the print output in, at which point I think we can just get
>>> away with an atomically incremented/forwarded position variable for
>>> where in the buffer the cursor currently is so that multiple threads
>>> can output to the buffer in parallel.
>> Be careful here: OpenCL 1.2 does not guarantee that atomic
>> operations work across work-groups. Hence, you’re going
>> to need a buffer per work-group.
> Do you have a reference for this? CLC2 introduced scoped atomic
> operations, but cl-1.0 says that "These transactions are atomic for the
> device executing these atomic functions." which, I think can be
> interpreted as device scope.
> GCN atomics are definitely global (performed in L2 cache). afaik the
> same is true for EG/CM, but I'd need to double check that color buffer
> works as expected. I don't know about nvidia.
>> Jeroen
>>> It honestly seems easier to do this in raw CLC than to pass the
>>> contents back to clover to run through a custom printf function
>>> anyway.  But what I have is currently in C, so I guess we could
>>> consider it...
> you can even implement both a keep the one that performs better :)
> Jan
>>> --Aaron
>>>> Jan
>>>>> --Aaron
>>>>>> Jan
>>>>>>> --Aaron
>>>>>>>> Jan
>>>>>>>>> Jan
>> _______________________________________________
>> Libclc-dev mailing list
>> Libclc-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev

More information about the Libclc-dev mailing list