[Libclc-dev] [PATCH v2 1/1] rootn: Flush denormals if not supported.

Mon May 21 19:46:07 PDT 2018

On Mon, May 21, 2018 at 5:49 PM, Jan Vesely <jan.vesely at rutgers.edu> wrote:
> On Thu, 2018-05-10 at 15:43 -0500, Aaron Watry via Libclc-dev wrote:
>> On Thu, May 10, 2018 at 1:52 PM, Jan Vesely <jan.vesely at rutgers.edu> wrote:
>> > On Thu, 2018-05-10 at 13:43 -0500, Aaron Watry via Libclc-dev wrote:
>> > > On Thu, May 10, 2018 at 1:16 PM, Jan Vesely <jan.vesely at rutgers.edu> wrote:
>> > > > On Wed, 2018-05-02 at 22:16 -0500, Aaron Watry via Libclc-dev wrote:
>> > > > > On Wed, 2018-05-02 at 21:51 -0400, Jan Vesely wrote:
>> > > > > > On Wed, 2018-05-02 at 07:03 -0500, Aaron Watry via Libclc-dev wrote:
>> > > > > > > Am I being dense or just lucky (device supports denormals?)..  This
>> > > > > > > already passed on my RX580 before I applied your patch.
>> > > > > >
>> > > > > > IIRC, the problem is not with denormal support (unless you enabled it
>> > > > > > explicitly), but that 'indx' variable was computed incorrectly. My
>> > > > > > guess would be that one of the earlier operations (mad?) improved wrt
>> > > > > > ULP precision (rootn still failed on my carrizo).
>> > > > > > Anyway, flushing denormals just hides the issue. it'll probably still
>> > > > > > fail if run with denormals enabled, but fixing denormal support is a
>> > > > > > story for another day.
>> > > > > >
>> > > > > > > I'm currently rebuilding new newer llvm on my r600 box that
>> > > > > > > hopefully
>> > > > > > > won't segfault when running rootn to test there.
>> > > > > >
>> > > > > > thanks. It works OK on my turks when math_bruteforce is run in single
>> > > > > > thread mode.
>> > > > >
>> > > > > Oh yeah, the compute memory pool on r600 isn't thread-safe...
>> > > > >
>> > > > > Let's just say that the email I sent this morning was while the first
>> > > > > cup of coffee was still unconsumed, and I had a small child in my lap
>> > > > > trying to commandeer my mouse. Not a great time for deep thoughts. :)
>> > > >
>> > > > Hi,
>> > > >
>> > > > any luck running on your r600?
>> > >
>> > > Yes, in single-threaded mode.  But in my case (HD 6850, BARTS) the
>> > > rootn test already passed with a max ULP of 1.0 before you patch, and
>> > > a max ULP of 7.0 after.
>> > >
>> > > The tolerance for rootn is <= 16, so both cases passed, but the
>> > > maximum error seems to have gone up after flushing subnormals.
>> > >
>> > > I've been staring at the patch off and on and trying to figure out if
>> > > it's doing something wrong. Maybe it's just difference in the
>> > > precision of the hardware we're using.
>> > >
>> > > If we really need to, I've also got a cayman-based APU chip and a PCI
>> > > CEDAR if we want/need to get a few more sample points.
>> >
>> > hm, that's interesting. My problem with EG was that it returned NaN. My
>> > guess would be there is a difference is in LLVM and how it handles
>> > division/reciprocals.
>> > Did the other pow (pow{,r,n}) routines also exhibit this behaviour?
>>
>> Not sure.  I don't believe so (I believe I usually reproduced a CTS
>> failure for those before confirming the fix), but I'd have to go back
>> in time with libclc to check.
>>
>> For reference, the testing I did with rootn was done with a current
>> mesa checkout as of earlier today (d07466fe18522cde1) with
>> LLVM r331343 and libclc r331435 as a base.
>>
>> Would you like me to go back and re-check the pow/powr/pown results on
>> my 6850 from before the denormal flushing changes? I'm re-running all
>> 3 in their current state right now.
>
> Hi,
>
> do divide and half_divide tests pass on you EG hw? I think that broken
> division may explain why a special fix for rootn was necessary.

For my 6850 (northern islands), the tests both fail with ULP errors:

80:     half_divide
ERROR: half_divide: -nan ulp error at {-inf (0xff800000),
-0x1.fffffep+127 (0xff7fffff)}: *inf vs. -nan (0xffc00000) at index:
197
95:          divide
ERROR: divide: -nan ulp error at {-inf, -0x1.fffffep+127}: *inf vs.
-nan (0xffc00000) at index: 197

I can pull my CEDAR (5400-series, actual evergreen card) from its
current home and test that as well, if you need/want me to.

--Aaron

>
> thanks,
> Jan
>
>>
>> --Aaron
>>
>> >
>> > Jan
>> >
>> > >
>> > > --Aaron
>> > >
>> > > >
>> > > > Jan
>> > > >
>> > > > >
>> > > > > --Aaron
>> > > > >
>> > > > > >
>> > > > > > Jan
>> > > > > >
>> > > > > > >
>> > > > > > > --Aaron
>> > > > > > >
>> > > > > > > On Mon, Apr 30, 2018 at 1:05 PM, Jan Vesely via Libclc-dev
>> > > > > > > <libclc-dev at lists.llvm.org> wrote:
>> > > > > > > > On Tue, 2018-04-24 at 12:31 -0400, Jan Vesely wrote:
>> > > > > > > > > It's OK to either flush to 0 or return denormal result if the
>> > > > > > > > > device
>> > > > > > > > > does not support denormals. See sec 7.2 and 7.5.3 of OCL specs
>> > > > > > > > >
>> > > > > > > > > v2: Use 0.0f explicitly intead of relying on GPU to flush it.
>> > > > > > > > >
>> > > > > > > > > Fixes CTS on carrizo and turks
>> > > > > > > > > Signed-off-by: Jan Vesely <jan.vesely at rutgers.edu>
>> > > > > > > > > ---
>> > > > > > > > > This removes the need for the second patch
>> > > > > > > > >  generic/lib/math/clc_rootn.cl | 11 +----------
>> > > > > > > > >  1 file changed, 1 insertion(+), 10 deletions(-)
>> > > > > > > > >
>> > > > > > > > > diff --git a/generic/lib/math/clc_rootn.cl
>> > > > > > > > > b/generic/lib/math/clc_rootn.cl
>> > > > > > > > > index d7ee185..0a2c98d 100644
>> > > > > > > > > --- a/generic/lib/math/clc_rootn.cl
>> > > > > > > > > +++ b/generic/lib/math/clc_rootn.cl
>> > > > > > > > > @@ -170,16 +170,7 @@ _CLC_DEF _CLC_OVERLOAD float
>> > > > > > > > > __clc_rootn(float x, int ny)
>> > > > > > > > >      tv = USE_TABLE(exp_tbl_ep, j);
>> > > > > > > > >
>> > > > > > > > >      float expylogx = mad(tv.s0, poly, mad(tv.s1, poly, tv.s1))
>> > > > > > > > > + tv.s0;
>> > > > > > > > > -    float sexpylogx;
>> > > > > > > > > -    if (!__clc_fp32_subnormals_supported()) {
>> > > > > > > > > -             int explg = ((as_uint(expylogx) & EXPBITS_SP32 >>
>> > > > > > > > > 23) - 127);
>> > > > > > > > > -             m = (23-(m + 149)) == 0 ? 1: m;
>> > > > > > > > > -             uint mantissa =  ((as_uint(expylogx) &
>> > > > > > > > > MANTBITS_SP32)|IMPBIT_SP32) >> (23-(m + 149));
>> > > > > > > > > -             sexpylogx = as_float(mantissa);
>> > > > > > > > > -    } else {
>> > > > > > > > > -             sexpylogx = expylogx * as_float(0x1 << (m +
>> > > > > > > > > 149));
>> > > > > > > > > -    }
>> > > > > > > > > -
>> > > > > > > > > +    float sexpylogx = __clc_fp32_subnormals_supported() ?
>> > > > > > > > > expylogx * as_float(0x1 << (m + 149)) : 0.0f;
>> > > > > > > > >
>> > > > > > > > >      float texpylogx = as_float(as_int(expylogx) + m2);
>> > > > > > > > >      expylogx = m < -125 ? sexpylogx : texpylogx;
>> > > > > > > >
>> > > > > > > > ping.
>> > > > > > > > _______________________________________________
>> > > > > > > > Libclc-dev mailing list
>> > > > > > > > Libclc-dev at lists.llvm.org
>> > > > > > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
>> > > > > > > >
>> > > > > > >
>> > > > > > > _______________________________________________
>> > > > > > > Libclc-dev mailing list
>> > > > > > > Libclc-dev at lists.llvm.org
>> > > > > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
>> > > > > >
>> > > > > >
>> > > > >
>> > > > > _______________________________________________
>> > > > > Libclc-dev mailing list
>> > > > > Libclc-dev at lists.llvm.org
>> > > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
>> > >
>> > > _______________________________________________
>> > > Libclc-dev mailing list
>> > > Libclc-dev at lists.llvm.org
>> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
>>
>> _______________________________________________
>> Libclc-dev mailing list
>> Libclc-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev