[Libclc-dev] [PATCH v2 1/1] rootn: Flush denormals if not supported.
Jan Vesely via Libclc-dev
libclc-dev at lists.llvm.org
Tue May 22 08:10:43 PDT 2018
On Mon, 2018-05-21 at 21:46 -0500, Aaron Watry via Libclc-dev wrote:
> On Mon, May 21, 2018 at 5:49 PM, Jan Vesely <jan.vesely at rutgers.edu> wrote:
> > On Thu, 2018-05-10 at 15:43 -0500, Aaron Watry via Libclc-dev wrote:
> > > On Thu, May 10, 2018 at 1:52 PM, Jan Vesely <jan.vesely at rutgers.edu> wrote:
> > > > On Thu, 2018-05-10 at 13:43 -0500, Aaron Watry via Libclc-dev wrote:
> > > > > On Thu, May 10, 2018 at 1:16 PM, Jan Vesely <jan.vesely at rutgers.edu> wrote:
> > > > > > On Wed, 2018-05-02 at 22:16 -0500, Aaron Watry via Libclc-dev wrote:
> > > > > > > On Wed, 2018-05-02 at 21:51 -0400, Jan Vesely wrote:
> > > > > > > > On Wed, 2018-05-02 at 07:03 -0500, Aaron Watry via Libclc-dev wrote:
> > > > > > > > > Am I being dense or just lucky (device supports denormals?).. This
> > > > > > > > > already passed on my RX580 before I applied your patch.
> > > > > > > >
> > > > > > > > IIRC, the problem is not with denormal support (unless you enabled it
> > > > > > > > explicitly), but that 'indx' variable was computed incorrectly. My
> > > > > > > > guess would be that one of the earlier operations (mad?) improved wrt
> > > > > > > > ULP precision (rootn still failed on my carrizo).
> > > > > > > > Anyway, flushing denormals just hides the issue. it'll probably still
> > > > > > > > fail if run with denormals enabled, but fixing denormal support is a
> > > > > > > > story for another day.
> > > > > > > >
> > > > > > > > > I'm currently rebuilding new newer llvm on my r600 box that
> > > > > > > > > hopefully
> > > > > > > > > won't segfault when running rootn to test there.
> > > > > > > >
> > > > > > > > thanks. It works OK on my turks when math_bruteforce is run in single
> > > > > > > > thread mode.
> > > > > > >
> > > > > > > Oh yeah, the compute memory pool on r600 isn't thread-safe...
> > > > > > >
> > > > > > > Let's just say that the email I sent this morning was while the first
> > > > > > > cup of coffee was still unconsumed, and I had a small child in my lap
> > > > > > > trying to commandeer my mouse. Not a great time for deep thoughts. :)
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > any luck running on your r600?
> > > > >
> > > > > Yes, in single-threaded mode. But in my case (HD 6850, BARTS) the
> > > > > rootn test already passed with a max ULP of 1.0 before you patch, and
> > > > > a max ULP of 7.0 after.
> > > > >
> > > > > The tolerance for rootn is <= 16, so both cases passed, but the
> > > > > maximum error seems to have gone up after flushing subnormals.
> > > > >
> > > > > I've been staring at the patch off and on and trying to figure out if
> > > > > it's doing something wrong. Maybe it's just difference in the
> > > > > precision of the hardware we're using.
> > > > >
> > > > > If we really need to, I've also got a cayman-based APU chip and a PCI
> > > > > CEDAR if we want/need to get a few more sample points.
> > > >
> > > > hm, that's interesting. My problem with EG was that it returned NaN. My
> > > > guess would be there is a difference is in LLVM and how it handles
> > > > division/reciprocals.
> > > > Did the other pow (pow{,r,n}) routines also exhibit this behaviour?
> > >
> > > Not sure. I don't believe so (I believe I usually reproduced a CTS
> > > failure for those before confirming the fix), but I'd have to go back
> > > in time with libclc to check.
> > >
> > > For reference, the testing I did with rootn was done with a current
> > > mesa checkout as of earlier today (d07466fe18522cde1) with
> > > LLVM r331343 and libclc r331435 as a base.
> > >
> > > Would you like me to go back and re-check the pow/powr/pown results on
> > > my 6850 from before the denormal flushing changes? I'm re-running all
> > > 3 in their current state right now.
> >
> > Hi,
> >
> > do divide and half_divide tests pass on you EG hw? I think that broken
> > division may explain why a special fix for rootn was necessary.
>
> For my 6850 (northern islands), the tests both fail with ULP errors:
>
> 80: half_divide
> ERROR: half_divide: -nan ulp error at {-inf (0xff800000),
> -0x1.fffffep+127 (0xff7fffff)}: *inf vs. -nan (0xffc00000) at index:
> 197
> 95: divide
> ERROR: divide: -nan ulp error at {-inf, -0x1.fffffep+127}: *inf vs.
> -nan (0xffc00000) at index: 197
>
> I can pull my CEDAR (5400-series, actual evergreen card) from its
> current home and test that as well, if you need/want me to.
thanks, but there's no need. I knew the problem with denormals in thes
routines (powX, rootn) was in the division part of the algorithm. I
thought it might explain why it worked for you and not on my turks.
but it looks like division is broken is broken with extreme values just
the same.
thanks,
Jan
>
> --Aaron
>
> >
> > thanks,
> > Jan
> >
> > >
> > > --Aaron
> > >
> > > >
> > > > Jan
> > > >
> > > > >
> > > > > --Aaron
> > > > >
> > > > > >
> > > > > > Jan
> > > > > >
> > > > > > >
> > > > > > > --Aaron
> > > > > > >
> > > > > > > >
> > > > > > > > Jan
> > > > > > > >
> > > > > > > > >
> > > > > > > > > --Aaron
> > > > > > > > >
> > > > > > > > > On Mon, Apr 30, 2018 at 1:05 PM, Jan Vesely via Libclc-dev
> > > > > > > > > <libclc-dev at lists.llvm.org> wrote:
> > > > > > > > > > On Tue, 2018-04-24 at 12:31 -0400, Jan Vesely wrote:
> > > > > > > > > > > It's OK to either flush to 0 or return denormal result if the
> > > > > > > > > > > device
> > > > > > > > > > > does not support denormals. See sec 7.2 and 7.5.3 of OCL specs
> > > > > > > > > > >
> > > > > > > > > > > v2: Use 0.0f explicitly intead of relying on GPU to flush it.
> > > > > > > > > > >
> > > > > > > > > > > Fixes CTS on carrizo and turks
> > > > > > > > > > > Signed-off-by: Jan Vesely <jan.vesely at rutgers.edu>
> > > > > > > > > > > ---
> > > > > > > > > > > This removes the need for the second patch
> > > > > > > > > > > generic/lib/math/clc_rootn.cl | 11 +----------
> > > > > > > > > > > 1 file changed, 1 insertion(+), 10 deletions(-)
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/generic/lib/math/clc_rootn.cl
> > > > > > > > > > > b/generic/lib/math/clc_rootn.cl
> > > > > > > > > > > index d7ee185..0a2c98d 100644
> > > > > > > > > > > --- a/generic/lib/math/clc_rootn.cl
> > > > > > > > > > > +++ b/generic/lib/math/clc_rootn.cl
> > > > > > > > > > > @@ -170,16 +170,7 @@ _CLC_DEF _CLC_OVERLOAD float
> > > > > > > > > > > __clc_rootn(float x, int ny)
> > > > > > > > > > > tv = USE_TABLE(exp_tbl_ep, j);
> > > > > > > > > > >
> > > > > > > > > > > float expylogx = mad(tv.s0, poly, mad(tv.s1, poly, tv.s1))
> > > > > > > > > > > + tv.s0;
> > > > > > > > > > > - float sexpylogx;
> > > > > > > > > > > - if (!__clc_fp32_subnormals_supported()) {
> > > > > > > > > > > - int explg = ((as_uint(expylogx) & EXPBITS_SP32 >>
> > > > > > > > > > > 23) - 127);
> > > > > > > > > > > - m = (23-(m + 149)) == 0 ? 1: m;
> > > > > > > > > > > - uint mantissa = ((as_uint(expylogx) &
> > > > > > > > > > > MANTBITS_SP32)|IMPBIT_SP32) >> (23-(m + 149));
> > > > > > > > > > > - sexpylogx = as_float(mantissa);
> > > > > > > > > > > - } else {
> > > > > > > > > > > - sexpylogx = expylogx * as_float(0x1 << (m +
> > > > > > > > > > > 149));
> > > > > > > > > > > - }
> > > > > > > > > > > -
> > > > > > > > > > > + float sexpylogx = __clc_fp32_subnormals_supported() ?
> > > > > > > > > > > expylogx * as_float(0x1 << (m + 149)) : 0.0f;
> > > > > > > > > > >
> > > > > > > > > > > float texpylogx = as_float(as_int(expylogx) + m2);
> > > > > > > > > > > expylogx = m < -125 ? sexpylogx : texpylogx;
> > > > > > > > > >
> > > > > > > > > > ping.
> > > > > > > > > > _______________________________________________
> > > > > > > > > > Libclc-dev mailing list
> > > > > > > > > > Libclc-dev at lists.llvm.org
> > > > > > > > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > _______________________________________________
> > > > > > > > > Libclc-dev mailing list
> > > > > > > > > Libclc-dev at lists.llvm.org
> > > > > > > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > Libclc-dev mailing list
> > > > > > > Libclc-dev at lists.llvm.org
> > > > > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
> > > > >
> > > > > _______________________________________________
> > > > > Libclc-dev mailing list
> > > > > Libclc-dev at lists.llvm.org
> > > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
> > >
> > > _______________________________________________
> > > Libclc-dev mailing list
> > > Libclc-dev at lists.llvm.org
> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
>
> _______________________________________________
> Libclc-dev mailing list
> Libclc-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: This is a digitally signed message part
URL: <http://lists.llvm.org/pipermail/libclc-dev/attachments/20180522/bf2a37c9/attachment-0001.sig>
More information about the Libclc-dev
mailing list