[Libclc-dev] [PATCH 1/9] RFC: Refactor clcmacro.h to vectorize without hi/lo

Aaron Watry awatry at gmail.com
Fri Aug 1 13:48:10 PDT 2014


On Thu, Jul 31, 2014 at 8:50 PM, Tom Stellard <tom at stellard.net> wrote:
> On Wed, Jul 30, 2014 at 05:34:27PM -0500, Aaron Watry wrote:
>> LLVM: git 6800393083a4030 / svn r213860 (From 7/24)
>> CLANG: git 5ce5cbd836b3 / svn r213853
>> libclc: git a63df067faf8a / svn r213762 with the following two patches applied:
>>
>> r600: Actually use vstore assembly optimizations
>> r600: improve float vload/vstore path
>>
>> With that setup, and mesa 0ddc28b026688d, I get failures in the
>> float16 nextafter tests.
>>
>
> OK, I will try this.  I think I may not have had those two libclc
> patches applied when I tested.
>

That would be helpful.  I'm trying to recover at the moment from
something that got upgraded and broke all of my systems which are
built against upgraded mesa/llvm versions. Until I can resolve that
issue, I'm unable to do any CL dev work on Evergreen and CL or GL on
SI.

If we can confirm that the llvm patches for alignment issues fixed the
use of those 2 patches on their own, then I'm at least happy to drop
the clcmacro.h refactoring.  I just can't actually test that myself at
the moment.

--Aaron

> -Tom
>
>> I'll give a shot at updating llvm/clang when I get home and see if
>> that helps things out.  I had noticed earlier today that Matt had
>> recently committed a bunch of vector load/store and alignment changes
>> (r214055 looks especially applicable), so maybe something in there did
>> the trick.
>>
>> --Aaron
>>
>> On Wed, Jul 30, 2014 at 3:39 PM, Tom Stellard <tom at stellard.net> wrote:
>> > On Sat, Jul 26, 2014 at 03:12:30PM -0500, Aaron Watry wrote:
>> >> On Fri, Jul 25, 2014 at 9:40 PM, Tom Stellard <tom at stellard.net> wrote:
>> >> > On Tue, Jul 22, 2014 at 08:46:42PM -0500, Aaron Watry wrote:
>> >> >> There are odd things happening with 16x vectors in nextafter() and sign().
>> >> >>
>> >> >> When I changed float16 load/store to use the assembly path in later
>> >> >> patches instead of the macro-vectorized version, nextafter and sign
>> >> >> stopped working (next 2 commits/patches).
>> >> >>
>> >> >> As near as I can tell, we're getting correct results for
>> >> >> elements 0-7, but then elements 8+ are wrong. The final result
>> >> >> seems to be composed of the first 8 elements of the computed
>> >> >> result, and then elements 16-23, which are likely uninitialized.
>> >> >>
>> >> >> I'd like to say the issue is in clang, but I have nothing to back
>> >> >> that up at the moment and give that this patch fixes the issue, I’m
>> >> >> not sure how much time I want to spend investigating right now.
>> >> >>
>> >> >> Explicitly splitting all of the vectorize macros the way that this patch
>> >> >> does gets everything working again, but I have a feeling that we're
>> >> >> papering over a bug somewhere, hence the RFC subject.
>> >> >>
>> >> >> I’m not sure what’s going on here, and it’s only the nextafter/sign
>> >> >> functions that regressed. This patch fixes the test results in piglit.
>> >> >>
>> >> >> No significant change on number of instructions in bitcode
>> >> >> for nextafter float16 (2-3 instructions savings over ~350 lines).
>> >> >>
>> >> >
>> >> > Were you testing this on Evergreen?  The sign() test passes for me on
>> >> > SI without this patch.
>> >> >
>> >>
>> >> I don't have my EG card available at the moment, but I *think* that
>> >> passed.  All I've got available at the moment is my pitcairn (si). For
>> >> me on pitcairn, I get failures in
>> >> piglit/generated_tests/cl/builtin/math/builtin-float-nextafter-1.0.generated.cl
>> >> for only the float16 type (about half of them pass, half don't).
>> >>
>> >
>> > This test passes for me too.  How current is your install of LLVM, I think there
>> > were some shufflevector fixes/improvements recently that may have fixed this test.
>> >
>> > -Tom
>> >
>> >> What happens if you test with [1] and [2] applied (enable vstore
>> >> optimizations and then add float to those optimizations), but without
>> >> the clcmacro refactoring patch [3]?
>> >>
>> >> [1] http://www.pcc.me.uk/pipermail/libclc-dev/2014-July/000461.html
>> >> [2] http://www.pcc.me.uk/pipermail/libclc-dev/2014-July/000462.html
>> >> [3] http://www.pcc.me.uk/pipermail/libclc-dev/2014-July/000460.html
>> >>
>> >> --Aaron
>> >>
>> >> > -Tom
>> >> >
>> >> >> Signed-off-by: Aaron Watry <awatry at gmail.com>
>> >> >> ---
>> >> >>  generic/lib/clcmacro.h | 84 +++++++++++++++++++++++++++++++++++++++++++-------
>> >> >>  1 file changed, 73 insertions(+), 11 deletions(-)
>> >> >>
>> >> >> diff --git a/generic/lib/clcmacro.h b/generic/lib/clcmacro.h
>> >> >> index 730073a..64b1770 100644
>> >> >> --- a/generic/lib/clcmacro.h
>> >> >> +++ b/generic/lib/clcmacro.h
>> >> >> @@ -1,44 +1,106 @@
>> >> >>  #define _CLC_UNARY_VECTORIZE(DECLSPEC, RET_TYPE, FUNCTION, ARG1_TYPE) \
>> >> >>    DECLSPEC RET_TYPE##2 FUNCTION(ARG1_TYPE##2 x) { \
>> >> >> -    return (RET_TYPE##2)(FUNCTION(x.x), FUNCTION(x.y)); \
>> >> >> +    return (RET_TYPE##2){FUNCTION(x.s0), FUNCTION(x.s1)}; \
>> >> >>    } \
>> >> >>  \
>> >> >>    DECLSPEC RET_TYPE##3 FUNCTION(ARG1_TYPE##3 x) { \
>> >> >> -    return (RET_TYPE##3)(FUNCTION(x.x), FUNCTION(x.y), FUNCTION(x.z)); \
>> >> >> +    return (RET_TYPE##3){FUNCTION(x.s0), FUNCTION(x.s1), FUNCTION(x.s2)}; \
>> >> >>    } \
>> >> >>  \
>> >> >>    DECLSPEC RET_TYPE##4 FUNCTION(ARG1_TYPE##4 x) { \
>> >> >> -    return (RET_TYPE##4)(FUNCTION(x.lo), FUNCTION(x.hi)); \
>> >> >> +    return (RET_TYPE##4){ \
>> >> >> +      FUNCTION(x.s0), \
>> >> >> +      FUNCTION(x.s1), \
>> >> >> +      FUNCTION(x.s2), \
>> >> >> +      FUNCTION(x.s3), \
>> >> >> +    }; \
>> >> >>    } \
>> >> >>  \
>> >> >>    DECLSPEC RET_TYPE##8 FUNCTION(ARG1_TYPE##8 x) { \
>> >> >> -    return (RET_TYPE##8)(FUNCTION(x.lo), FUNCTION(x.hi)); \
>> >> >> +    return (RET_TYPE##8){ \
>> >> >> +      FUNCTION(x.s0), \
>> >> >> +      FUNCTION(x.s1), \
>> >> >> +      FUNCTION(x.s2), \
>> >> >> +      FUNCTION(x.s3), \
>> >> >> +      FUNCTION(x.s4), \
>> >> >> +      FUNCTION(x.s5), \
>> >> >> +      FUNCTION(x.s6), \
>> >> >> +      FUNCTION(x.s7), \
>> >> >> +    }; \
>> >> >>    } \
>> >> >>  \
>> >> >>    DECLSPEC RET_TYPE##16 FUNCTION(ARG1_TYPE##16 x) { \
>> >> >> -    return (RET_TYPE##16)(FUNCTION(x.lo), FUNCTION(x.hi)); \
>> >> >> +    return (RET_TYPE##16){ \
>> >> >> +      FUNCTION(x.s0), \
>> >> >> +      FUNCTION(x.s1), \
>> >> >> +      FUNCTION(x.s2), \
>> >> >> +      FUNCTION(x.s3), \
>> >> >> +      FUNCTION(x.s4), \
>> >> >> +      FUNCTION(x.s5), \
>> >> >> +      FUNCTION(x.s6), \
>> >> >> +      FUNCTION(x.s7), \
>> >> >> +      FUNCTION(x.s8), \
>> >> >> +      FUNCTION(x.s9), \
>> >> >> +      FUNCTION(x.sa), \
>> >> >> +      FUNCTION(x.sb), \
>> >> >> +      FUNCTION(x.sc), \
>> >> >> +      FUNCTION(x.sd), \
>> >> >> +      FUNCTION(x.se), \
>> >> >> +      FUNCTION(x.sf) \
>> >> >> +    }; \
>> >> >>    }
>> >> >>
>> >> >>  #define _CLC_BINARY_VECTORIZE(DECLSPEC, RET_TYPE, FUNCTION, ARG1_TYPE, ARG2_TYPE) \
>> >> >>    DECLSPEC RET_TYPE##2 FUNCTION(ARG1_TYPE##2 x, ARG2_TYPE##2 y) { \
>> >> >> -    return (RET_TYPE##2)(FUNCTION(x.x, y.x), FUNCTION(x.y, y.y)); \
>> >> >> +    return (RET_TYPE##2){FUNCTION(x.s0, y.s0), FUNCTION(x.s1, y.s1)}; \
>> >> >>    } \
>> >> >>  \
>> >> >>    DECLSPEC RET_TYPE##3 FUNCTION(ARG1_TYPE##3 x, ARG2_TYPE##3 y) { \
>> >> >> -    return (RET_TYPE##3)(FUNCTION(x.x, y.x), FUNCTION(x.y, y.y), \
>> >> >> -                         FUNCTION(x.z, y.z)); \
>> >> >> +    return (RET_TYPE##3){FUNCTION(x.s0, y.s0), FUNCTION(x.s1, y.s1), \
>> >> >> +                         FUNCTION(x.s2, y.s2)}; \
>> >> >>    } \
>> >> >>  \
>> >> >>    DECLSPEC RET_TYPE##4 FUNCTION(ARG1_TYPE##4 x, ARG2_TYPE##4 y) { \
>> >> >> -    return (RET_TYPE##4)(FUNCTION(x.lo, y.lo), FUNCTION(x.hi, y.hi)); \
>> >> >> +    return (RET_TYPE##4){ \
>> >> >> +      FUNCTION(x.s0, y.s0), \
>> >> >> +      FUNCTION(x.s1, y.s1), \
>> >> >> +      FUNCTION(x.s2, y.s2), \
>> >> >> +      FUNCTION(x.s3, y.s3), \
>> >> >> +    }; \
>> >> >>    } \
>> >> >>  \
>> >> >>    DECLSPEC RET_TYPE##8 FUNCTION(ARG1_TYPE##8 x, ARG2_TYPE##8 y) { \
>> >> >> -    return (RET_TYPE##8)(FUNCTION(x.lo, y.lo), FUNCTION(x.hi, y.hi)); \
>> >> >> +    return (RET_TYPE##8){ \
>> >> >> +      FUNCTION(x.s0, y.s0), \
>> >> >> +      FUNCTION(x.s1, y.s1), \
>> >> >> +      FUNCTION(x.s2, y.s2), \
>> >> >> +      FUNCTION(x.s3, y.s3), \
>> >> >> +      FUNCTION(x.s4, y.s4), \
>> >> >> +      FUNCTION(x.s5, y.s5), \
>> >> >> +      FUNCTION(x.s6, y.s6), \
>> >> >> +      FUNCTION(x.s7, y.s7), \
>> >> >> +    }; \
>> >> >>    } \
>> >> >>  \
>> >> >>    DECLSPEC RET_TYPE##16 FUNCTION(ARG1_TYPE##16 x, ARG2_TYPE##16 y) { \
>> >> >> -    return (RET_TYPE##16)(FUNCTION(x.lo, y.lo), FUNCTION(x.hi, y.hi)); \
>> >> >> +    return (RET_TYPE##16){ \
>> >> >> +      FUNCTION(x.s0, y.s0), \
>> >> >> +      FUNCTION(x.s1, y.s1), \
>> >> >> +      FUNCTION(x.s2, y.s2), \
>> >> >> +      FUNCTION(x.s3, y.s3), \
>> >> >> +      FUNCTION(x.s4, y.s4), \
>> >> >> +      FUNCTION(x.s5, y.s5), \
>> >> >> +      FUNCTION(x.s6, y.s6), \
>> >> >> +      FUNCTION(x.s7, y.s7), \
>> >> >> +      FUNCTION(x.s8, y.s8), \
>> >> >> +      FUNCTION(x.s9, y.s9), \
>> >> >> +      FUNCTION(x.sa, y.sa), \
>> >> >> +      FUNCTION(x.sb, y.sb), \
>> >> >> +      FUNCTION(x.sc, y.sc), \
>> >> >> +      FUNCTION(x.sd, y.sd), \
>> >> >> +      FUNCTION(x.se, y.se), \
>> >> >> +      FUNCTION(x.sf, y.sf) \
>> >> >> +    }; \
>> >> >>    }
>> >> >>
>> >> >>  #define _CLC_DEFINE_BINARY_BUILTIN(RET_TYPE, FUNCTION, BUILTIN, ARG1_TYPE, ARG2_TYPE) \
>> >> >> --
>> >> >> 1.9.1
>> >> >>
>> >> >>
>> >> >> _______________________________________________
>> >> >> Libclc-dev mailing list
>> >> >> Libclc-dev at pcc.me.uk
>> >> >> http://www.pcc.me.uk/cgi-bin/mailman/listinfo/libclc-dev




More information about the Libclc-dev mailing list