[Libclc-dev] clover-todo (was: [PATCH] atomics: redefine atom_inc/atom_dec using atom_add/atom_sub)

Aaron Watry via Libclc-dev libclc-dev at lists.llvm.org
Fri Jun 30 11:48:43 PDT 2017


On Fri, Jun 30, 2017 at 11:32 AM, Jan Vesely <jan.vesely at rutgers.edu> wrote:
> On Mon, 2017-06-26 at 10:29 -0500, Aaron Watry wrote:
>> On Mon, Jun 26, 2017 at 6:18 AM, Jan Vesely <jan.vesely at rutgers.edu> wrote:
>> > On Fri, 2017-06-23 at 20:33 -0500, Aaron Watry via Libclc-dev wrote:
>> > > On Thu, Jun 22, 2017 at 10:25 AM, Jan Vesely <jan.vesely at rutgers.edu> wrote:
>> > > > On Wed, 2017-06-21 at 11:06 -0500, Aaron Watry via Libclc-dev wrote:
>> > > > > On Wed, Jun 21, 2017 at 9:16 AM, Jan Vesely <jan.vesely at rutgers.edu> wrote:
>> > > > > > On Tue, 2017-06-20 at 21:11 -0500, Aaron Watry via Libclc-dev wrote:
>> > > > > > > This is exactly what the atomic_[inc|dec] functions do, and it fixes
>> > > > > > > kernel compilation failures using the OpenCL CTS.
>> > > > > >
>> > > > > > what are the compilation failures? I thought piglit had these ops
>> > > > > > covered.
>> > > > >
>> > > > > Piglit has atomic_* covered, but not the CL 1.0 atom_* functions.
>> > > >
>> > > > ah right. guess we should add those as well (another time...).
>> > >
>> > > Yeah.  In theory, we can add those to piglit. Since the CTS was
>> > > open-sourced, I'm mostly using its failures as a worklist for now just
>> > > to prevent duplication of work.
>> > >
>> > > >
>> > > > >
>> > > > > The issue I'm running into with the CTS is that atomic_* works fine,
>> > > > > but I'm getting an ambiguous function reference for
>> > > > > atom_inc/atom_dec/atom_or/atom_and.
>> > > > >
>> > > > > Example:
>> > > > >
>> > > > > #pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable
>> > > > > __kernel void test_atomic_fn(volatile __global uint *destMemory,
>> > > > > __global uint *oldValues)
>> > > > > {
>> > > > >     int  tid = get_global_id(0);
>> > > > >     oldValues[tid] = atom_dec( &destMemory[0] );
>> > > > > }
>> > > > >
>> > > > > Log:
>> > > > > Build not successful for device "AMD PITCAIRN (DRM 2.49.0 /
>> > > > > 4.11.0-041100rc8-generic, LLVM 5.0.0)", status: CL_BUILD_ERROR
>> > > > > Build log for device "AMD PITCAIRN (DRM 2.49.0 /
>> > > > > 4.11.0-041100rc8-generic, LLVM 5.0.0)" is:
>> > > > > ------------
>> > > > > input.cl:6:22: error: call to 'atom_dec' is ambiguous
>> > > > > /usr/local/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h:1:29:
>> > > > > note: candidate function
>> > > > > /usr/local/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h:2:38:
>> > > > > note: candidate function
>> > > > > ------------
>> > > >
>> > > > this looks like clang treats both:
>> > > > "volatile __global uint* -> __global unsigned int*"
>> > > > and
>> > > > "volatile __global uint* -> __globale int *"
>> > > > as equal cost.
>> > > >
>> > > > Can you confirm that removing 'volatile' from destMemory hides the
>> > > > issue?
>> > >
>> > > Yes, I can confirm that if I rebuild libclc using the latest upstream
>> > > revision (and without any of my changes)
>> > > and then remove the volatile keyword from the atom_* test kernels
>> > > being compiled that the
>> > > tests compile and run to completion successfully.
>> > >
>> > > >
>> > > > >
>> > > > > >
>> > > > > > >
>> > > > > > > Signed-off-by: Aaron Watry <awatry at gmail.com>
>> > > > > > > ---
>> > > > > > >  generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h | 3 +--
>> > > > > > >  generic/include/clc/cl_khr_global_int32_base_atomics/atom_inc.h | 3 +--
>> > > > > > >  generic/include/clc/cl_khr_local_int32_base_atomics/atom_dec.h  | 3 +--
>> > > > > > >  generic/include/clc/cl_khr_local_int32_base_atomics/atom_inc.h  | 3 +--
>> > > > > > >  generic/lib/SOURCES                                             | 4 ----
>> > > > > > >  generic/lib/cl_khr_global_int32_base_atomics/atom_dec.cl        | 9 ---------
>> > > > > > >  generic/lib/cl_khr_global_int32_base_atomics/atom_inc.cl        | 9 ---------
>> > > > > > >  generic/lib/cl_khr_local_int32_base_atomics/atom_dec.cl         | 9 ---------
>> > > > > > >  generic/lib/cl_khr_local_int32_base_atomics/atom_inc.cl         | 9 ---------
>> > > > > > >  9 files changed, 4 insertions(+), 48 deletions(-)
>> > > > > > >  delete mode 100644 generic/lib/cl_khr_global_int32_base_atomics/atom_dec.cl
>> > > > > > >  delete mode 100644 generic/lib/cl_khr_global_int32_base_atomics/atom_inc.cl
>> > > > > > >  delete mode 100644 generic/lib/cl_khr_local_int32_base_atomics/atom_dec.cl
>> > > > > > >  delete mode 100644 generic/lib/cl_khr_local_int32_base_atomics/atom_inc.cl
>> > > > > > >
>> > > > > > > diff --git a/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h b/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h
>> > > > > > > index bbc872c..a520fe4 100644
>> > > > > > > --- a/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h
>> > > > > > > +++ b/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h
>> > > > > > > @@ -1,2 +1 @@
>> > > > > > > -_CLC_OVERLOAD _CLC_DECL int atom_dec(global int *p);
>> > > > > > > -_CLC_OVERLOAD _CLC_DECL unsigned int atom_dec(global unsigned int *p);
>> > > > > > > +#define atom_dec(p) atom_sub(p, 1)
>> > > > > >
>> > > > > > Does this not cause redefinition warning (since local version uses the
>> > > > > > same define?
>> > > > >
>> > > > > Yes, it looks like it does.  When building libclc and running the CTS
>> > > > > tests, everything looks ok, but if I compile the atom_dec kernel
>> > > > > manually via clang, I get:
>> > > > > --------------------------------
>> > > > > atom_dec_uint.cl:5:32: warning: passing 'volatile __global uint *'
>> > > > > (aka 'volatile __global unsigned int *') to parameter of type
>> > > > > '__global int *' discards qualifiers
>> > > > >       [-Wincompatible-pointer-types-discards-qualifiers]
>> > > > >     oldValues[tid] = atom_dec( &destMemory[0] );
>> > > > >                                ^~~~~~~~~~~~~~
>> > > > > /usr/local/include/clc/cl_khr_local_int32_base_atomics/atom_dec.h:9:30:
>> > > > > note: expanded from macro 'atom_dec'
>> > > > > #define atom_dec(p) atom_sub(p, 1)
>> > > > >                              ^
>> > > > > /usr/local/include/clc/cl_khr_global_int32_base_atomics/atom_sub.h:10:50:
>> > > > > note: passing argument to parameter 'p' here
>> > > > > _CLC_OVERLOAD _CLC_DECL int atom_sub(global int *p, int val);
>> > > > >                                                  ^
>> > > > > 1 warning generated.
>> > > > > --------------------------------
>> > > > >
>> > > > > So yeah, I guess this is the wrong approach.
>> > > > >
>> > > > > I did just discover while poking around at the kernel that if I remove
>> > > > > the "volatile" keyword from destMemory, then the kernel builds fine.
>> > > > > The difference in the atomic_inc and the CL 1.0 atom_inc function
>> > > > > declarations is basically that volatile keyword. If I re-add the
>> > > > > volatile keyword to the kernel, and then also add it to the function
>> > > > > declaration in atom_dec.h, the kernel also builds.  Looking at POCL
>> > > > > and Beignet they both just do a single blanket define in a top-level
>> > > > > header file along the lines of:
>> > > > > #define atom_dec atomic_dec
>> > > > >
>> > > > > That implies that both of those runtimes declare the global/local
>> > > > > pointers as volatile even for the CL1.0 variants, which doesn't
>> > > > > necessarily match the spec (although given how the CL 1.1/1.2 spec are
>> > > > > written, they basically just say "the CL 1.0 atom_* functions are
>> > > > > still supported), but it is what the CTS is testing.  I don't
>> > > > > currently have access to the definitions that nv/amd use in their
>> > > > > closed-source binaries (and I haven't managed to find what ROCm uses,
>> > > > > but I'm thinking that the implication is that when Khronos
>> > > > > renamed/aliased atom_* to atomic_* in CL 1.1, they were implicitly
>> > > > > retroactively adding the 'volatile' keyword to the function argument.
>> > > > >
>> > > > > I guess we could just remove the existing defines under
>> > > > > cl_khr_[global|local]_int32_base_atomics and put a single set of
>> > > > > defines in either clc.h, or in a new header under something like
>> > > > > generic/include/clc/atomic/atom_functions.h that just defines the
>> > > > > renames of the existing functions. Alternatively, we could just go
>> > > > > back and add the volatile keyword to the pointer arguments in the
>> > > > > atom_* headers/implementations.
>> > > > >
>> > > > > Thoughts?
>> > > >
>> > > > I checked that we also use similar define for atomic_inc/dec.
>> > > > 9040bf38 addressed similar issue when calling atom_add.
>> > > > I think adding volatile to atom_* functions would just hide the issue
>> > > > (since test suites use volatile pointers).
>> > > > see [0]:
>> > > > "A conversion from a pointer of type ``T*`` to a pointer of type ``U*``, where
>> > > >   ``T`` and ``U`` are incompatible, is allowed, but is ranked below all other
>> > > >   types of conversions. Please note: ``U`` lacking qualifiers that are present
>> > > >   on ``T`` is sufficient for ``T`` and ``U`` to be incompatible."
>> > > >
>> > > > I think a proper fix needs to be done on clang side.
>> > >
>> > > Yes, the commit message for that llvm revision you linked sounds
>> > > exactly like what is going on here.
>> > >
>> > > Ugh, looks like I need to read up on llvm development guidelines again
>> > > (a codebase that I try to limit my exposure to for my own sanity),
>> > > unless you want to look into that part. ;-)
>> >
>> > Clang is not really my thing, and this issue is a rather low priority
>> > for me. completing atomics support or adding register spilling for EG
>> > are higher on my list.
>> >
>> > You can try contacting Yaxun, he appeared to run into the same issue
>> > and contributes to clang regularly.
>>
>> I'll take a look, and might reach out for help, or I'll try to dig
>> into it and see if I can come up with a solution myself.
>>
>> For now, like you, this is just one of the minor issues that I've been
>> running into.
>>
>> A bigger issue is the regression caused by clang r303370, which I
>> believe is what causes ALL of the CTS local atomic tests to fail,
>> along with a few other test suites that expect to be able to
>> set a local buffer as a kernel argument.
>
> Yaxun is working on this (see the other libclc-dev thread).

Yeah, I saw that.  If I have some time this weekend (or on the 4th), I'll try to
see if there's something that we can do in clover, if it's not resolved by then.

>
>>
>> There's also issues with some of the math/* tests due to accuracy
>> issues. I'm not sure if that's down to optimization flags causing bad
>> results, or if there's issues in the underlying algorithms.
>>
>> For reference, the math tests with ULP-tolerance errors are:  exp10,
>> floor, fmax, fmin, fmod, hypot, remainder, tan.
>> The following pass wimpy mode with float, but fail on doubles: floor, log10
>> Others: pown fails with an "unsupported call to function", lgamma_r
>> segfaults during kernel compilation, and frexp errors out for doubles
>> with an instruction-selection error.
>
> I'm surprised that remainder works at all since it's not implemented in
> libclc, are you using local patches?

Yeah, I've got a local patch for that one that was submitted to libclc
back in January, but it never got committed due to accuracy issues.

http://lists.llvm.org/pipermail/libclc-dev/2017-January/002341.html

>
>>
>> I'm hoping that there's an easy single root-cause for most of the
>> accuracy issues, especially since floor just calls out to the
>> llvm.floor.[f32|f64|vNf32|vNf64] intrinsic, which you'd assume would
>> be ok. fmax, fmin, and pow also all seem to have issues with accuracy
>> when the inputs include nan/-nan, which might be an easy fix.
>
> I think this might be more tricky, since there is little information
> about ULP precision of hw instructions. some operations might need to
> be reimplemented in sw (for certain generations of hw). I'm not sure if
> libclc or the llvm backend lowering pass is a better place for this.
>
> fmin/fmax are a bit special since they should not change to value if
> one side is NaN. I vaguely remember a discussion that the instructions
> might still flush denormals in that case.

Possible.  I haven't looked into the failures yet.

>
>>
>> So yeah, there's plenty of work to do, even in just the math*
>> functions, outside of dealing with pointer qualifiers breaking things.
>
> It sounds like you work on GCN hw. do you still have/run clover on the
> cedar board?

The CEDAR is gone (old work machine that was retired).

The cards that I currently have available and installed are:
Radeon 7850 (PITCAIRN, GCN 1.0)
Radeon 6850 (BARTS, Northern Islands)
Radeon 6530D (SUMO, Llano 3-core APU)
Intel HD 4600 (Haswell IGP w/ Beignet, i7-4810MQ)

I do have a Radeon 5400 PCI (non-express) CEDAR card installed in an
old Alpha Personal WorkStation, but that machine doesn't have a
workable OS at the moment.

I've also got a Kepler-level GeForce 760 sitting on the desk, but I
haven't found a home for it yet.

--Aaron

>
> Jan
>
>>
>> --Aaron
>>
>> >
>> > regards,
>> > Jan
>> >
>> > >
>> > > --Aaron
>> > >
>> > > >
>> > > > Jan
>> > > >
>> > > > [0] https://reviews.llvm.org/D24113
>> > > >
>> > > > >
>> > > > > --Aaron
>> > > > >
>> > > > > >
>> > > > > > Jan


More information about the Libclc-dev mailing list