[Libclc-dev] clover-todo (was: [PATCH] atomics: redefine atom_inc/atom_dec using atom_add/atom_sub)
Jan Vesely via Libclc-dev
libclc-dev at lists.llvm.org
Fri Jun 30 09:32:51 PDT 2017
On Mon, 2017-06-26 at 10:29 -0500, Aaron Watry wrote:
> On Mon, Jun 26, 2017 at 6:18 AM, Jan Vesely <jan.vesely at rutgers.edu> wrote:
> > On Fri, 2017-06-23 at 20:33 -0500, Aaron Watry via Libclc-dev wrote:
> > > On Thu, Jun 22, 2017 at 10:25 AM, Jan Vesely <jan.vesely at rutgers.edu> wrote:
> > > > On Wed, 2017-06-21 at 11:06 -0500, Aaron Watry via Libclc-dev wrote:
> > > > > On Wed, Jun 21, 2017 at 9:16 AM, Jan Vesely <jan.vesely at rutgers.edu> wrote:
> > > > > > On Tue, 2017-06-20 at 21:11 -0500, Aaron Watry via Libclc-dev wrote:
> > > > > > > This is exactly what the atomic_[inc|dec] functions do, and it fixes
> > > > > > > kernel compilation failures using the OpenCL CTS.
> > > > > >
> > > > > > what are the compilation failures? I thought piglit had these ops
> > > > > > covered.
> > > > >
> > > > > Piglit has atomic_* covered, but not the CL 1.0 atom_* functions.
> > > >
> > > > ah right. guess we should add those as well (another time...).
> > >
> > > Yeah. In theory, we can add those to piglit. Since the CTS was
> > > open-sourced, I'm mostly using its failures as a worklist for now just
> > > to prevent duplication of work.
> > >
> > > >
> > > > >
> > > > > The issue I'm running into with the CTS is that atomic_* works fine,
> > > > > but I'm getting an ambiguous function reference for
> > > > > atom_inc/atom_dec/atom_or/atom_and.
> > > > >
> > > > > Example:
> > > > >
> > > > > #pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable
> > > > > __kernel void test_atomic_fn(volatile __global uint *destMemory,
> > > > > __global uint *oldValues)
> > > > > {
> > > > > int tid = get_global_id(0);
> > > > > oldValues[tid] = atom_dec( &destMemory[0] );
> > > > > }
> > > > >
> > > > > Log:
> > > > > Build not successful for device "AMD PITCAIRN (DRM 2.49.0 /
> > > > > 4.11.0-041100rc8-generic, LLVM 5.0.0)", status: CL_BUILD_ERROR
> > > > > Build log for device "AMD PITCAIRN (DRM 2.49.0 /
> > > > > 4.11.0-041100rc8-generic, LLVM 5.0.0)" is:
> > > > > ------------
> > > > > input.cl:6:22: error: call to 'atom_dec' is ambiguous
> > > > > /usr/local/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h:1:29:
> > > > > note: candidate function
> > > > > /usr/local/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h:2:38:
> > > > > note: candidate function
> > > > > ------------
> > > >
> > > > this looks like clang treats both:
> > > > "volatile __global uint* -> __global unsigned int*"
> > > > and
> > > > "volatile __global uint* -> __globale int *"
> > > > as equal cost.
> > > >
> > > > Can you confirm that removing 'volatile' from destMemory hides the
> > > > issue?
> > >
> > > Yes, I can confirm that if I rebuild libclc using the latest upstream
> > > revision (and without any of my changes)
> > > and then remove the volatile keyword from the atom_* test kernels
> > > being compiled that the
> > > tests compile and run to completion successfully.
> > >
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > Signed-off-by: Aaron Watry <awatry at gmail.com>
> > > > > > > ---
> > > > > > > generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h | 3 +--
> > > > > > > generic/include/clc/cl_khr_global_int32_base_atomics/atom_inc.h | 3 +--
> > > > > > > generic/include/clc/cl_khr_local_int32_base_atomics/atom_dec.h | 3 +--
> > > > > > > generic/include/clc/cl_khr_local_int32_base_atomics/atom_inc.h | 3 +--
> > > > > > > generic/lib/SOURCES | 4 ----
> > > > > > > generic/lib/cl_khr_global_int32_base_atomics/atom_dec.cl | 9 ---------
> > > > > > > generic/lib/cl_khr_global_int32_base_atomics/atom_inc.cl | 9 ---------
> > > > > > > generic/lib/cl_khr_local_int32_base_atomics/atom_dec.cl | 9 ---------
> > > > > > > generic/lib/cl_khr_local_int32_base_atomics/atom_inc.cl | 9 ---------
> > > > > > > 9 files changed, 4 insertions(+), 48 deletions(-)
> > > > > > > delete mode 100644 generic/lib/cl_khr_global_int32_base_atomics/atom_dec.cl
> > > > > > > delete mode 100644 generic/lib/cl_khr_global_int32_base_atomics/atom_inc.cl
> > > > > > > delete mode 100644 generic/lib/cl_khr_local_int32_base_atomics/atom_dec.cl
> > > > > > > delete mode 100644 generic/lib/cl_khr_local_int32_base_atomics/atom_inc.cl
> > > > > > >
> > > > > > > diff --git a/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h b/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h
> > > > > > > index bbc872c..a520fe4 100644
> > > > > > > --- a/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h
> > > > > > > +++ b/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h
> > > > > > > @@ -1,2 +1 @@
> > > > > > > -_CLC_OVERLOAD _CLC_DECL int atom_dec(global int *p);
> > > > > > > -_CLC_OVERLOAD _CLC_DECL unsigned int atom_dec(global unsigned int *p);
> > > > > > > +#define atom_dec(p) atom_sub(p, 1)
> > > > > >
> > > > > > Does this not cause redefinition warning (since local version uses the
> > > > > > same define?
> > > > >
> > > > > Yes, it looks like it does. When building libclc and running the CTS
> > > > > tests, everything looks ok, but if I compile the atom_dec kernel
> > > > > manually via clang, I get:
> > > > > --------------------------------
> > > > > atom_dec_uint.cl:5:32: warning: passing 'volatile __global uint *'
> > > > > (aka 'volatile __global unsigned int *') to parameter of type
> > > > > '__global int *' discards qualifiers
> > > > > [-Wincompatible-pointer-types-discards-qualifiers]
> > > > > oldValues[tid] = atom_dec( &destMemory[0] );
> > > > > ^~~~~~~~~~~~~~
> > > > > /usr/local/include/clc/cl_khr_local_int32_base_atomics/atom_dec.h:9:30:
> > > > > note: expanded from macro 'atom_dec'
> > > > > #define atom_dec(p) atom_sub(p, 1)
> > > > > ^
> > > > > /usr/local/include/clc/cl_khr_global_int32_base_atomics/atom_sub.h:10:50:
> > > > > note: passing argument to parameter 'p' here
> > > > > _CLC_OVERLOAD _CLC_DECL int atom_sub(global int *p, int val);
> > > > > ^
> > > > > 1 warning generated.
> > > > > --------------------------------
> > > > >
> > > > > So yeah, I guess this is the wrong approach.
> > > > >
> > > > > I did just discover while poking around at the kernel that if I remove
> > > > > the "volatile" keyword from destMemory, then the kernel builds fine.
> > > > > The difference in the atomic_inc and the CL 1.0 atom_inc function
> > > > > declarations is basically that volatile keyword. If I re-add the
> > > > > volatile keyword to the kernel, and then also add it to the function
> > > > > declaration in atom_dec.h, the kernel also builds. Looking at POCL
> > > > > and Beignet they both just do a single blanket define in a top-level
> > > > > header file along the lines of:
> > > > > #define atom_dec atomic_dec
> > > > >
> > > > > That implies that both of those runtimes declare the global/local
> > > > > pointers as volatile even for the CL1.0 variants, which doesn't
> > > > > necessarily match the spec (although given how the CL 1.1/1.2 spec are
> > > > > written, they basically just say "the CL 1.0 atom_* functions are
> > > > > still supported), but it is what the CTS is testing. I don't
> > > > > currently have access to the definitions that nv/amd use in their
> > > > > closed-source binaries (and I haven't managed to find what ROCm uses,
> > > > > but I'm thinking that the implication is that when Khronos
> > > > > renamed/aliased atom_* to atomic_* in CL 1.1, they were implicitly
> > > > > retroactively adding the 'volatile' keyword to the function argument.
> > > > >
> > > > > I guess we could just remove the existing defines under
> > > > > cl_khr_[global|local]_int32_base_atomics and put a single set of
> > > > > defines in either clc.h, or in a new header under something like
> > > > > generic/include/clc/atomic/atom_functions.h that just defines the
> > > > > renames of the existing functions. Alternatively, we could just go
> > > > > back and add the volatile keyword to the pointer arguments in the
> > > > > atom_* headers/implementations.
> > > > >
> > > > > Thoughts?
> > > >
> > > > I checked that we also use similar define for atomic_inc/dec.
> > > > 9040bf38 addressed similar issue when calling atom_add.
> > > > I think adding volatile to atom_* functions would just hide the issue
> > > > (since test suites use volatile pointers).
> > > > see [0]:
> > > > "A conversion from a pointer of type ``T*`` to a pointer of type ``U*``, where
> > > > ``T`` and ``U`` are incompatible, is allowed, but is ranked below all other
> > > > types of conversions. Please note: ``U`` lacking qualifiers that are present
> > > > on ``T`` is sufficient for ``T`` and ``U`` to be incompatible."
> > > >
> > > > I think a proper fix needs to be done on clang side.
> > >
> > > Yes, the commit message for that llvm revision you linked sounds
> > > exactly like what is going on here.
> > >
> > > Ugh, looks like I need to read up on llvm development guidelines again
> > > (a codebase that I try to limit my exposure to for my own sanity),
> > > unless you want to look into that part. ;-)
> >
> > Clang is not really my thing, and this issue is a rather low priority
> > for me. completing atomics support or adding register spilling for EG
> > are higher on my list.
> >
> > You can try contacting Yaxun, he appeared to run into the same issue
> > and contributes to clang regularly.
>
> I'll take a look, and might reach out for help, or I'll try to dig
> into it and see if I can come up with a solution myself.
>
> For now, like you, this is just one of the minor issues that I've been
> running into.
>
> A bigger issue is the regression caused by clang r303370, which I
> believe is what causes ALL of the CTS local atomic tests to fail,
> along with a few other test suites that expect to be able to
> set a local buffer as a kernel argument.
Yaxun is working on this (see the other libclc-dev thread).
>
> There's also issues with some of the math/* tests due to accuracy
> issues. I'm not sure if that's down to optimization flags causing bad
> results, or if there's issues in the underlying algorithms.
>
> For reference, the math tests with ULP-tolerance errors are: exp10,
> floor, fmax, fmin, fmod, hypot, remainder, tan.
> The following pass wimpy mode with float, but fail on doubles: floor, log10
> Others: pown fails with an "unsupported call to function", lgamma_r
> segfaults during kernel compilation, and frexp errors out for doubles
> with an instruction-selection error.
I'm surprised that remainder works at all since it's not implemented in
libclc, are you using local patches?
>
> I'm hoping that there's an easy single root-cause for most of the
> accuracy issues, especially since floor just calls out to the
> llvm.floor.[f32|f64|vNf32|vNf64] intrinsic, which you'd assume would
> be ok. fmax, fmin, and pow also all seem to have issues with accuracy
> when the inputs include nan/-nan, which might be an easy fix.
I think this might be more tricky, since there is little information
about ULP precision of hw instructions. some operations might need to
be reimplemented in sw (for certain generations of hw). I'm not sure if
libclc or the llvm backend lowering pass is a better place for this.
fmin/fmax are a bit special since they should not change to value if
one side is NaN. I vaguely remember a discussion that the instructions
might still flush denormals in that case.
>
> So yeah, there's plenty of work to do, even in just the math*
> functions, outside of dealing with pointer qualifiers breaking things.
It sounds like you work on GCN hw. do you still have/run clover on the
cedar board?
Jan
>
> --Aaron
>
> >
> > regards,
> > Jan
> >
> > >
> > > --Aaron
> > >
> > > >
> > > > Jan
> > > >
> > > > [0] https://reviews.llvm.org/D24113
> > > >
> > > > >
> > > > > --Aaron
> > > > >
> > > > > >
> > > > > > Jan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <http://lists.llvm.org/pipermail/libclc-dev/attachments/20170630/ef2c8855/attachment.sig>
More information about the Libclc-dev
mailing list