[Libclc-dev] [PATCH v2 1/1] rootn: Flush denormals if not supported.

Thu May 10 11:43:12 PDT 2018

On Thu, May 10, 2018 at 1:16 PM, Jan Vesely <jan.vesely at rutgers.edu> wrote:
> On Wed, 2018-05-02 at 22:16 -0500, Aaron Watry via Libclc-dev wrote:
>> On Wed, 2018-05-02 at 21:51 -0400, Jan Vesely wrote:
>> > On Wed, 2018-05-02 at 07:03 -0500, Aaron Watry via Libclc-dev wrote:
>> > > Am I being dense or just lucky (device supports denormals?)..  This
>> > > already passed on my RX580 before I applied your patch.
>> >
>> > IIRC, the problem is not with denormal support (unless you enabled it
>> > explicitly), but that 'indx' variable was computed incorrectly. My
>> > guess would be that one of the earlier operations (mad?) improved wrt
>> > ULP precision (rootn still failed on my carrizo).
>> > Anyway, flushing denormals just hides the issue. it'll probably still
>> > fail if run with denormals enabled, but fixing denormal support is a
>> > story for another day.
>> >
>> > > I'm currently rebuilding new newer llvm on my r600 box that
>> > > hopefully
>> > > won't segfault when running rootn to test there.
>> >
>> > thanks. It works OK on my turks when math_bruteforce is run in single
>> > thread mode.
>>
>> Oh yeah, the compute memory pool on r600 isn't thread-safe...
>>
>> Let's just say that the email I sent this morning was while the first
>> cup of coffee was still unconsumed, and I had a small child in my lap
>> trying to commandeer my mouse. Not a great time for deep thoughts. :)
> Hi,
>
> any luck running on your r600?

Yes, in single-threaded mode.  But in my case (HD 6850, BARTS) the
rootn test already passed with a max ULP of 1.0 before you patch, and
a max ULP of 7.0 after.

The tolerance for rootn is <= 16, so both cases passed, but the
maximum error seems to have gone up after flushing subnormals.

I've been staring at the patch off and on and trying to figure out if
it's doing something wrong. Maybe it's just difference in the
precision of the hardware we're using.

If we really need to, I've also got a cayman-based APU chip and a PCI
CEDAR if we want/need to get a few more sample points.

--Aaron

>
> Jan
>
>>
>> --Aaron
>>
>> >
>> > Jan
>> >
>> > >
>> > > --Aaron
>> > >
>> > > On Mon, Apr 30, 2018 at 1:05 PM, Jan Vesely via Libclc-dev
>> > > <libclc-dev at lists.llvm.org> wrote:
>> > > > On Tue, 2018-04-24 at 12:31 -0400, Jan Vesely wrote:
>> > > > > It's OK to either flush to 0 or return denormal result if the
>> > > > > device
>> > > > > does not support denormals. See sec 7.2 and 7.5.3 of OCL specs
>> > > > >
>> > > > > v2: Use 0.0f explicitly intead of relying on GPU to flush it.
>> > > > >
>> > > > > Fixes CTS on carrizo and turks
>> > > > > Signed-off-by: Jan Vesely <jan.vesely at rutgers.edu>
>> > > > > ---
>> > > > > This removes the need for the second patch
>> > > > >  generic/lib/math/clc_rootn.cl | 11 +----------
>> > > > >  1 file changed, 1 insertion(+), 10 deletions(-)
>> > > > >
>> > > > > diff --git a/generic/lib/math/clc_rootn.cl
>> > > > > b/generic/lib/math/clc_rootn.cl
>> > > > > index d7ee185..0a2c98d 100644
>> > > > > --- a/generic/lib/math/clc_rootn.cl
>> > > > > +++ b/generic/lib/math/clc_rootn.cl
>> > > > > @@ -170,16 +170,7 @@ _CLC_DEF _CLC_OVERLOAD float
>> > > > > __clc_rootn(float x, int ny)
>> > > > >      tv = USE_TABLE(exp_tbl_ep, j);
>> > > > >
>> > > > >      float expylogx = mad(tv.s0, poly, mad(tv.s1, poly, tv.s1))
>> > > > > + tv.s0;
>> > > > > -    float sexpylogx;
>> > > > > -    if (!__clc_fp32_subnormals_supported()) {
>> > > > > -             int explg = ((as_uint(expylogx) & EXPBITS_SP32 >>
>> > > > > 23) - 127);
>> > > > > -             m = (23-(m + 149)) == 0 ? 1: m;
>> > > > > -             uint mantissa =  ((as_uint(expylogx) &
>> > > > > MANTBITS_SP32)|IMPBIT_SP32) >> (23-(m + 149));
>> > > > > -             sexpylogx = as_float(mantissa);
>> > > > > -    } else {
>> > > > > -             sexpylogx = expylogx * as_float(0x1 << (m +
>> > > > > 149));
>> > > > > -    }
>> > > > > -
>> > > > > +    float sexpylogx = __clc_fp32_subnormals_supported() ?
>> > > > > expylogx * as_float(0x1 << (m + 149)) : 0.0f;
>> > > > >
>> > > > >      float texpylogx = as_float(as_int(expylogx) + m2);
>> > > > >      expylogx = m < -125 ? sexpylogx : texpylogx;
>> > > >
>> > > > ping.
>> > > > _______________________________________________
>> > > > Libclc-dev mailing list
>> > > > Libclc-dev at lists.llvm.org
>> > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
>> > > >
>> > >
>> > > _______________________________________________
>> > > Libclc-dev mailing list
>> > > Libclc-dev at lists.llvm.org
>> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
>> >
>> >
>>
>> _______________________________________________
>> Libclc-dev mailing list
>> Libclc-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev