[Libclc-dev] [PATCH v2 1/1] rootn: Flush denormals if not supported.
Aaron Watry via Libclc-dev
libclc-dev at lists.llvm.org
Mon May 14 20:29:38 PDT 2018
On Mon, May 14, 2018, 9:59 PM Jan Vesely <jan.vesely at rutgers.edu> wrote:
> On Thu, 2018-05-10 at 21:19 -0500, Aaron Watry via Libclc-dev wrote:
> > On Thu, May 10, 2018 at 7:02 PM, Jan Vesely <jan.vesely at rutgers.edu>
> wrote:
> > > On Thu, 2018-05-10 at 15:43 -0500, Aaron Watry via Libclc-dev wrote:
> > > > On Thu, May 10, 2018 at 1:52 PM, Jan Vesely <jan.vesely at rutgers.edu>
> wrote:
> > > > > On Thu, 2018-05-10 at 13:43 -0500, Aaron Watry via Libclc-dev
> wrote:
> > > > > > On Thu, May 10, 2018 at 1:16 PM, Jan Vesely <
> jan.vesely at rutgers.edu> wrote:
> > > > > > > On Wed, 2018-05-02 at 22:16 -0500, Aaron Watry via Libclc-dev
> wrote:
> > > > > > > > On Wed, 2018-05-02 at 21:51 -0400, Jan Vesely wrote:
> > > > > > > > > On Wed, 2018-05-02 at 07:03 -0500, Aaron Watry via
> Libclc-dev wrote:
> > > > > > > > > > Am I being dense or just lucky (device supports
> denormals?).. This
> > > > > > > > > > already passed on my RX580 before I applied your patch.
> > > > > > > > >
> > > > > > > > > IIRC, the problem is not with denormal support (unless you
> enabled it
> > > > > > > > > explicitly), but that 'indx' variable was computed
> incorrectly. My
> > > > > > > > > guess would be that one of the earlier operations (mad?)
> improved wrt
> > > > > > > > > ULP precision (rootn still failed on my carrizo).
> > > > > > > > > Anyway, flushing denormals just hides the issue. it'll
> probably still
> > > > > > > > > fail if run with denormals enabled, but fixing denormal
> support is a
> > > > > > > > > story for another day.
> > > > > > > > >
> > > > > > > > > > I'm currently rebuilding new newer llvm on my r600 box
> that
> > > > > > > > > > hopefully
> > > > > > > > > > won't segfault when running rootn to test there.
> > > > > > > > >
> > > > > > > > > thanks. It works OK on my turks when math_bruteforce is
> run in single
> > > > > > > > > thread mode.
> > > > > > > >
> > > > > > > > Oh yeah, the compute memory pool on r600 isn't thread-safe...
> > > > > > > >
> > > > > > > > Let's just say that the email I sent this morning was while
> the first
> > > > > > > > cup of coffee was still unconsumed, and I had a small child
> in my lap
> > > > > > > > trying to commandeer my mouse. Not a great time for deep
> thoughts. :)
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > any luck running on your r600?
> > > > > >
> > > > > > Yes, in single-threaded mode. But in my case (HD 6850, BARTS)
> the
> > > > > > rootn test already passed with a max ULP of 1.0 before you
> patch, and
> > > > > > a max ULP of 7.0 after.
> > > > > >
> > > > > > The tolerance for rootn is <= 16, so both cases passed, but the
> > > > > > maximum error seems to have gone up after flushing subnormals.
> > > > > >
> > > > > > I've been staring at the patch off and on and trying to figure
> out if
> > > > > > it's doing something wrong. Maybe it's just difference in the
> > > > > > precision of the hardware we're using.
> > > > > >
> > > > > > If we really need to, I've also got a cayman-based APU chip and
> a PCI
> > > > > > CEDAR if we want/need to get a few more sample points.
> > > > >
> > > > > hm, that's interesting. My problem with EG was that it returned
> NaN. My
> > > > > guess would be there is a difference is in LLVM and how it handles
> > > > > division/reciprocals.
> > > > > Did the other pow (pow{,r,n}) routines also exhibit this behaviour?
> > > >
> > > > Not sure. I don't believe so (I believe I usually reproduced a CTS
> > > > failure for those before confirming the fix), but I'd have to go back
> > > > in time with libclc to check.
> > > >
> > > > For reference, the testing I did with rootn was done with a current
> > > > mesa checkout as of earlier today (d07466fe18522cde1) with
> > > > LLVM r331343 and libclc r331435 as a base.
> > > >
> > > > Would you like me to go back and re-check the pow/powr/pown results
> on
> > > > my 6850 from before the denormal flushing changes? I'm re-running all
> > > > 3 in their current state right now.
> > >
> > > Actually my turks setup uses llvm-git. It's weird that you don't see
> > > the NaN issues on you cedar.
> >
> > I have a cedar, but it's currently in another system (An old DEC
> > Alpha). The R600-based card I usually test with is the BARTS. The last
> > r600 card I have is a 3-core Llano APU (Cayman-derived I believe,
> > SUMO2 chip), unless you want to include the chipset-based IGPs on a
> > few motherboards I've got.
> >
> > Just for reference, I went back to the commit immediately before the
> > denormal fixes for pow/powr/pown on my BARTS, and all 3 fail wimpy
> > mode before the denormal fixes (in single-threaded mode). I haven't
> > bothered with a full non-wimpy run.
> >
> > > I don't think you need to invest much time into this. Given the
> > > manpower I think it's preferable to have one version that works across
> > > many devices/llvm versions.
> > > Improved precision is nice, but probably not something to sweat about.
> > > It'd be more interesting to see if the explicit 0 allows the compiler
> > > to generate faster code.
> >
> > Yeah, the test passes within allowed tolerances, and I don't want to
> > have multiple versions of the code unless there's a good reason to.
> >
> > If this lets more chips pass without errors, I'm fine with this going in
> as-is.
>
> thanks. May I consider it an acked-by?
>
Yeah. Acked-by, tested-by. Either or both are fine with me.
>
> Jan
>
> >
> > --Aaron
> >
> > >
> > > thanks,
> > > Jan
> > >
> > > >
> > > > --Aaron
> > > >
> > > > >
> > > > > Jan
> > > > >
> > > > > >
> > > > > > --Aaron
> > > > > >
> > > > > > >
> > > > > > > Jan
> > > > > > >
> > > > > > > >
> > > > > > > > --Aaron
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Jan
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --Aaron
> > > > > > > > > >
> > > > > > > > > > On Mon, Apr 30, 2018 at 1:05 PM, Jan Vesely via
> Libclc-dev
> > > > > > > > > > <libclc-dev at lists.llvm.org> wrote:
> > > > > > > > > > > On Tue, 2018-04-24 at 12:31 -0400, Jan Vesely wrote:
> > > > > > > > > > > > It's OK to either flush to 0 or return denormal
> result if the
> > > > > > > > > > > > device
> > > > > > > > > > > > does not support denormals. See sec 7.2 and 7.5.3 of
> OCL specs
> > > > > > > > > > > >
> > > > > > > > > > > > v2: Use 0.0f explicitly intead of relying on GPU to
> flush it.
> > > > > > > > > > > >
> > > > > > > > > > > > Fixes CTS on carrizo and turks
> > > > > > > > > > > > Signed-off-by: Jan Vesely <jan.vesely at rutgers.edu>
> > > > > > > > > > > > ---
> > > > > > > > > > > > This removes the need for the second patch
> > > > > > > > > > > > generic/lib/math/clc_rootn.cl | 11 +----------
> > > > > > > > > > > > 1 file changed, 1 insertion(+), 10 deletions(-)
> > > > > > > > > > > >
> > > > > > > > > > > > diff --git a/generic/lib/math/clc_rootn.cl
> > > > > > > > > > > > b/generic/lib/math/clc_rootn.cl
> > > > > > > > > > > > index d7ee185..0a2c98d 100644
> > > > > > > > > > > > --- a/generic/lib/math/clc_rootn.cl
> > > > > > > > > > > > +++ b/generic/lib/math/clc_rootn.cl
> > > > > > > > > > > > @@ -170,16 +170,7 @@ _CLC_DEF _CLC_OVERLOAD float
> > > > > > > > > > > > __clc_rootn(float x, int ny)
> > > > > > > > > > > > tv = USE_TABLE(exp_tbl_ep, j);
> > > > > > > > > > > >
> > > > > > > > > > > > float expylogx = mad(tv.s0, poly, mad(tv.s1,
> poly, tv.s1))
> > > > > > > > > > > > + tv.s0;
> > > > > > > > > > > > - float sexpylogx;
> > > > > > > > > > > > - if (!__clc_fp32_subnormals_supported()) {
> > > > > > > > > > > > - int explg = ((as_uint(expylogx) &
> EXPBITS_SP32 >>
> > > > > > > > > > > > 23) - 127);
> > > > > > > > > > > > - m = (23-(m + 149)) == 0 ? 1: m;
> > > > > > > > > > > > - uint mantissa = ((as_uint(expylogx) &
> > > > > > > > > > > > MANTBITS_SP32)|IMPBIT_SP32) >> (23-(m + 149));
> > > > > > > > > > > > - sexpylogx = as_float(mantissa);
> > > > > > > > > > > > - } else {
> > > > > > > > > > > > - sexpylogx = expylogx * as_float(0x1 <<
> (m +
> > > > > > > > > > > > 149));
> > > > > > > > > > > > - }
> > > > > > > > > > > > -
> > > > > > > > > > > > + float sexpylogx =
> __clc_fp32_subnormals_supported() ?
> > > > > > > > > > > > expylogx * as_float(0x1 << (m + 149)) : 0.0f;
> > > > > > > > > > > >
> > > > > > > > > > > > float texpylogx = as_float(as_int(expylogx) +
> m2);
> > > > > > > > > > > > expylogx = m < -125 ? sexpylogx : texpylogx;
> > > > > > > > > > >
> > > > > > > > > > > ping.
> > > > > > > > > > > _______________________________________________
> > > > > > > > > > > Libclc-dev mailing list
> > > > > > > > > > > Libclc-dev at lists.llvm.org
> > > > > > > > > > >
> http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > _______________________________________________
> > > > > > > > > > Libclc-dev mailing list
> > > > > > > > > > Libclc-dev at lists.llvm.org
> > > > > > > > > >
> http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > Libclc-dev mailing list
> > > > > > > > Libclc-dev at lists.llvm.org
> > > > > > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
> > > > > >
> > > > > > _______________________________________________
> > > > > > Libclc-dev mailing list
> > > > > > Libclc-dev at lists.llvm.org
> > > > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
> > > >
> > > > _______________________________________________
> > > > Libclc-dev mailing list
> > > > Libclc-dev at lists.llvm.org
> > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
> >
> > _______________________________________________
> > Libclc-dev mailing list
> > Libclc-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
>
> --
> Jan Vesely <jan.vesely at rutgers.edu>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/libclc-dev/attachments/20180514/afc1ded4/attachment-0001.html>
More information about the Libclc-dev
mailing list