[Libclc-dev] Add initial support for half precision builtins (take 3)

Thu May 17 12:47:08 PDT 2018

On Thu, 2018-05-17 at 11:42 -0500, Aaron Watry via Libclc-dev wrote:
> On Wed, May 16, 2018 at 3:53 PM, Jan Vesely via Libclc-dev
> <libclc-dev at lists.llvm.org> wrote:
> > On Wed, 2018-05-16 at 21:20 +0200, Jeroen Ketema via Libclc-dev wrote:
> > > > On 16 May 2018, at 00:59, Jan Vesely <jan.vesely at rutgers.edu> wrote:
> > > > 
> > > > Hi,
> > > > 
> > > > thank you both, for fast feedback. I've attached v2 that should address
> > > > most of the comments. I left larger whitespace cleanup for another day.
> > > > 
> > > > 
> > > > On Tue, 2018-05-15 at 22:46 +0200, Jeroen Ketema via Libclc-dev wrote:
> > > > > Hi Jan,
> > > > > 
> > > > > > On 15 May 2018, at 05:09, Jan Vesely via Libclc-dev <libclc-dev at lists.llvm.org> wrote:
> > > > > > 
> > > > > > Hi,
> > > > > > 
> > > > > > I've tried to post this patch but it's too big for the ML (and I don't
> > > > > > know who to bug to get it released).
> > > > > > 
> > > > > > It adds basic fp16 builtins. The rest will probably need a fp32 wrapper
> > > > > > (especially the sw implementations of math functions), but that needs
> > > > > > fixed conversion routines.
> > > > > 
> > > > > Is that going to fly? Wouldn’t you run into double rounding issues?
> > > > 
> > > > It probably will, but it's the fastest solution. The alternative is to
> > > > adapt all sw routines to half precision (including tables). At least
> > > > GCN llvm backend does this for some ops anyway.
> > > > 
> > > > Fixing conversion routines is higher priority. We can revisit this
> > > > after.
> > > 
> > > Yeah, if the aim is just to get something working, then just wrapping
> > > the fp32 code is fine.
> > > 
> > > One further remark: I think generic/lib/math/modf.inc needs to #undef ZERO.
> > 
> > Thanks. I've fixed this locally. Guess since it was the same definition
> > the compiler didn't complain.
> > 
> > > 
> > > One question about generic/lib/math/clc_sqrt_impl.inc:
> > > 
> > > +#define __CLC_NAN (half)NAN
> > > 
> > > I assume the reason for this is that there’s __builtin_nanh(“”), or something
> > > like that? (The ‘h’ in there is coming from my imagination.)
> > 
> > yes. there's no __builtin_nanh(), this relies on the conversion
> > preserving NaN.
> > 
> > > 
> > > Otherwise, I don’t have any further comments.
> > 
> > thanks,
> > Jan
> 
> One more thing.
> 
> I was running through the "quick" CTS test profile yesterday, just to
> see where we're at, and I noticed the following failures
> from conformance_Test_half:
> 
> Testing vstorea_half
> 268435452) Failure at 0x1.fffff8p-96: *0x0000 vs 0xdead  vector_size =
> 3 address_space = global
> Testing vstore_half_rte
> Testing vstorea_half_rte
> 268435452) Failure at 0x1.fffff8p-96: *0x0000 vs 0xdead  vector_size =
> 3 address_space = global
> Testing vstore_half_rtz
> Testing vstorea_half_rtz
> 268435452) Failure at 0x1.fffff8p-96: *0x0000 vs 0xdead  vector_size =
> 3 address_space = global
> Testing vstore_half_rtp
> Testing vstorea_half_rtp
> 268435452) Failure at 0x1.fffff8p-96: *0x0001 vs 0xdead  vector_size =
> 3 address_space = global
> Testing vstore_half_rtn
> Testing vstorea_half_rtn
> 268435452) Failure at 0x1.fffff8p-96: *0x0000 vs 0xdead  vector_size =
> 3 address_space = global
> 
> Something about the 3-component vstore_half is off, but I'm not sure
> if this was already broken before, or if the "#if __CLC_FPSIZE > 16"
> change in vstore_half.inc broke it.

Hi,

this was broken before. vstore/vload_half exist irrespective of
cl_khr_fp16. The "#if" part is there to prevent vstore_half(half,
size_t, half*), there's vstore(half, size_t, half*) for that.
vload_half has similar guard but it only exists for float.

I've actually started looking into this as a simpler problem to fix
than the conversion generator.
I found the issue to be in CTS numVecs function(cl_utils.c:327) which
aligns up for all cases but (aligned && vecsize == 3).
Thus the last few values are simply not run and return the poison value
(0xdead). You can check whether "count" is divisible by 4.

A simple patch to numVecs function:

-        return count/4;
+        return (count + 3)/4;

should fix the issue, I've only checked vstorea_half.
I'm pretty sure it technically results in OOB access, but thanks to GPU
buffer size alignment to 64 bytes it shouldn't corrupt memory.

Does the patch look OK to you now? v3 adds "ZERO" undef. Let me know if
you'd prefer I sent it out as well.

thanks,
Jan

> 
> --Aaron
> 
> > 
> > > 
> > > Jeroen
> > > 
> > > > 
> > > > regards,
> > > > Jan
> > > > 
> > > > > 
> > > > > In addition to Aaron’s comments:
> > > > > 
> > > > > +#define HALF_RADIX       2
> > > > > +#define HALF_MAX         0x1.ffcp15h
> > > > > +#define HALF_MIN         0x1.0p-14h
> > > > > +#define HALF_EPSILON     0x1.0p-10h
> > > > > 
> > > > > There seems to be both a space and a tab between the macro name and
> > > > > its definition. I also see a few places where indentation is done
> > > > > with a tab, e.g., generic/lib/relational/isunordered.cl.
> > > > > 
> > > > > From nan.inc:
> > > > > 
> > > > > #if __CLC_FPSIZE == 64
> > > > > #define __CLC_NATN __CLC_XCONCAT(ulong, __CLC_VECSIZE)
> > > > > -#else
> > > > > +#elif __CLC_FPSIZE == 32
> > > > > #define __CLC_NATN __CLC_XCONCAT(uint, __CLC_VECSIZE)
> > > > > +#else
> > > > > +#define __CLC_NATN __CLC_XCONCAT(ushort, __CLC_VECSIZE)
> > > > > #endif
> > > > > 
> > > > > Wouldn’t it make more sense to also use “#elif __CLC_FPSIZE == 16”, and maybe #error otherwise? Similar in a few other places.
> > > > > 
> > > > > In generic/lib/math/acos.inc:
> > > > > 
> > > > > +#else
> > > > > +#define __CLC_CONST(x) xh
> > > > > 
> > > > > I think there should be a ## between the x and h.
> > > > > 
> > > > > In generic/include/math/clc_ldexp.h, the guarded text in indented, while indentation is not used anywhere else.
> > > > > 
> > > > > I see some cases where the "+#if __CLC_FPSIZE > 16” guard does not have a TODO attached to it, is that on purpose?
> > > > > 
> > > > > In generic/lib/math/modf.inc casts 0.0f, while generic/lib/math/fract.inc introduces a ZERO macro. Maybe this should be made more uniform (doesn’t have to be as part of this patch).
> > > > > 
> > > > > Near the end of the patch the int/vector of short difference is explained in a comment "returns an int, but the vector versions return short”. Maybe add a similar comment to generic/include/clc/relational/floatn.inc?
> > > > > 
> > > > > Jeroen
> > > > > 
> > > > > > 
> > > > > > This should be enough to make clpeak happy (BZ: 96897)
> > > > > > 
> > > > > > Jan
> > > > > > --
> > > > > > Jan Vesely <jan.vesely at rutgers.edu><0001-Add-initial-support-for-half-precision-builtins.patch.gz>_______________________________________________
> > > > > > Libclc-dev mailing list
> > > > > > Libclc-dev at lists.llvm.org
> > > > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
> > > > > 
> > > > > _______________________________________________
> > > > > Libclc-dev mailing list
> > > > > Libclc-dev at lists.llvm.org <mailto:Libclc-dev at lists.llvm.org>
> > > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev>
> > > > 
> > > > --
> > > > Jan Vesely <jan.vesely at rutgers.edu <mailto:jan.vesely at rutgers.edu>><v2-0001-Add-initial-support-for-half-precision-builtins.patch.gz>
> > > 
> > > _______________________________________________
> > > Libclc-dev mailing list
> > > Libclc-dev at lists.llvm.org
> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
> > 
> > _______________________________________________
> > Libclc-dev mailing list
> > Libclc-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
> > 
> 
> _______________________________________________
> Libclc-dev mailing list
> Libclc-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: This is a digitally signed message part
URL: <http://lists.llvm.org/pipermail/libclc-dev/attachments/20180517/aafad88a/attachment.sig>