[Libclc-dev] [PATCH 1/9] RFC: Refactor clcmacro.h to vectorize without hi/lo

Aaron Watry awatry at gmail.com
Sat Aug 9 08:38:56 PDT 2014


I can confirm that the LLVM patch linked fixes the issue on my pitcairn.

--Aaron

On Thu, Aug 7, 2014 at 11:00 AM, Tom Stellard <tom at stellard.net> wrote:
> On Wed, Aug 06, 2014 at 05:02:29PM -0500, Aaron Watry wrote:
>> Good to know that the root cause is being narrowed down.  Let me know
>> if you need me to test anything.
>>
>
> It should be fixed with this patch:
>
> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140804/230038.html
>
> -Tom
>
>> --Aaron
>>
>> On Wed, Aug 6, 2014 at 1:20 PM, Tom Stellard <tom at stellard.net> wrote:
>> > On Fri, Aug 01, 2014 at 11:27:48PM -0500, Aaron Watry wrote:
>> >> On Fri, Aug 1, 2014 at 4:57 PM, Tom Stellard <tom at stellard.net> wrote:
>> >> > On Fri, Aug 01, 2014 at 03:48:10PM -0500, Aaron Watry wrote:
>> >> >> On Thu, Jul 31, 2014 at 8:50 PM, Tom Stellard <tom at stellard.net> wrote:
>> >> >> > On Wed, Jul 30, 2014 at 05:34:27PM -0500, Aaron Watry wrote:
>> >> >> >> LLVM: git 6800393083a4030 / svn r213860 (From 7/24)
>> >> >> >> CLANG: git 5ce5cbd836b3 / svn r213853
>> >> >> >> libclc: git a63df067faf8a / svn r213762 with the following two patches applied:
>> >> >> >>
>> >> >> >> r600: Actually use vstore assembly optimizations
>> >> >> >> r600: improve float vload/vstore path
>> >> >> >>
>> >> >> >> With that setup, and mesa 0ddc28b026688d, I get failures in the
>> >> >> >> float16 nextafter tests.
>> >> >> >>
>> >> >> >
>> >> >> > OK, I will try this.  I think I may not have had those two libclc
>> >> >> > patches applied when I tested.
>> >> >> >
>> >> >>
>> >> >> That would be helpful.  I'm trying to recover at the moment from
>> >> >> something that got upgraded and broke all of my systems which are
>> >> >> built against upgraded mesa/llvm versions. Until I can resolve that
>> >> >> issue, I'm unable to do any CL dev work on Evergreen and CL or GL on
>> >> >> SI.
>> >> >>
>> >> >
>> >> > Could this be caused by the recent version bump of LLVM from 3.5 to 3.6?
>> >> >
>> >>
>> >> Nope, LLVM seems fine. I've rebuilt it from scratch and nuked my
>> >> install prefix before re-installing my entire graphics stack and
>> >> force-removing all of the distro-provided
>> >> mesa/llvm/xserver-xorg-video-ati/glamor packages.
>> >>
>> >> I've spent most of the evening on it, but eventually tracked the issue
>> >> on my radeonsi to the following mesa commit (which matches what I
>> >> bisected on my evergreen system earlier):
>> >>
>> >> 3b176c441b7ddc5f7d2f891da3f76cf3c1814ce1 is the first bad commit
>> >> commit 3b176c441b7ddc5f7d2f891da3f76cf3c1814ce1
>> >> Author: Giovanni Campagna <gcampagna at src.gnome.org>
>> >> Date:   Wed Jul 23 19:37:31 2014 +0100
>> >>
>> >>     gallium: Add a dumb drm/kms winsys backed swrast provider
>> >>
>> >>     Add a new winsys and target that can be used with a dri2 state tracker
>> >>     and loader instead of drisw. This allows to use gbm as a dri2/image
>> >>     loader and avoid the extra copy from the backbuffer to the shadow
>> >>     frontbuffer.
>> >>
>> >>     The new driver is called "kms_swrast", and is loaded by gbm as a
>> >>     fallback, because it is only useful with the gbm platform (as no buffer
>> >>     sharing is possible)
>> >>
>> >>     To force select the driver set the environment variable
>> >>     GBM_ALWAYS_SOFTWARE
>> >>
>> >>     [Emil Velikov]
>> >>      - Rebase on top of gallium megadriver.
>> >>      - s/text/test/ in configure.ac (Spotted by Andreas Pokorny).
>> >>      - Add scons support for winsys/sw/kms-dri and fix the build.
>> >>      - Provide separate DriverAPI, due to different InitScreen hook.
>> >>
>> >>     Signed-off-by: Emil Velikov <emil.l.velikov at gmail.com>
>> >>
>> >>
>> >>
>> >> So for now, I can at least run cl programs again on my home system,
>> >> which claims that the nextafter test still fails with
>> >> llvm/clang/libclc head revisions with the two patches in question
>> >> applied (enable vstore bitcode optimizations on r600 and then add
>> >> float to the i32 load/store paths)...
>> >>
>> >
>> > I was able to reproduce this failure and it is a bug in either the DAg
>> > Legalizer or the  R600 backend.  concat_vectors is being lowered by
>> > storing the vectors to the stack and then loading the result.
>> >
>> > This is a very inefficient way to implement lowering, but it should still
>> > work.  I'm working on a fix for this now.
>> >
>> > -Tom
>> >
>> >> So it doesn't look like Matt's alignment changes helped this scenario,
>> >> or maybe we're still running into another bug (which the clcmacro.h
>> >> refactor papers over).  At least on my system (pitcairn).
>> >>
>> >> --Aaron
>> >>
>> >>
>> >> > -Tom
>> >> >
>> >> >> If we can confirm that the llvm patches for alignment issues fixed the
>> >> >> use of those 2 patches on their own, then I'm at least happy to drop
>> >> >> the clcmacro.h refactoring.  I just can't actually test that myself at
>> >> >> the moment.
>> >> >>
>> >> >> --Aaron
>> >> >>
>> >> >> > -Tom
>> >> >> >
>> >> >> >> I'll give a shot at updating llvm/clang when I get home and see if
>> >> >> >> that helps things out.  I had noticed earlier today that Matt had
>> >> >> >> recently committed a bunch of vector load/store and alignment changes
>> >> >> >> (r214055 looks especially applicable), so maybe something in there did
>> >> >> >> the trick.
>> >> >> >>
>> >> >> >> --Aaron
>> >> >> >>
>> >> >> >> On Wed, Jul 30, 2014 at 3:39 PM, Tom Stellard <tom at stellard.net> wrote:
>> >> >> >> > On Sat, Jul 26, 2014 at 03:12:30PM -0500, Aaron Watry wrote:
>> >> >> >> >> On Fri, Jul 25, 2014 at 9:40 PM, Tom Stellard <tom at stellard.net> wrote:
>> >> >> >> >> > On Tue, Jul 22, 2014 at 08:46:42PM -0500, Aaron Watry wrote:
>> >> >> >> >> >> There are odd things happening with 16x vectors in nextafter() and sign().
>> >> >> >> >> >>
>> >> >> >> >> >> When I changed float16 load/store to use the assembly path in later
>> >> >> >> >> >> patches instead of the macro-vectorized version, nextafter and sign
>> >> >> >> >> >> stopped working (next 2 commits/patches).
>> >> >> >> >> >>
>> >> >> >> >> >> As near as I can tell, we're getting correct results for
>> >> >> >> >> >> elements 0-7, but then elements 8+ are wrong. The final result
>> >> >> >> >> >> seems to be composed of the first 8 elements of the computed
>> >> >> >> >> >> result, and then elements 16-23, which are likely uninitialized.
>> >> >> >> >> >>
>> >> >> >> >> >> I'd like to say the issue is in clang, but I have nothing to back
>> >> >> >> >> >> that up at the moment and give that this patch fixes the issue, I’m
>> >> >> >> >> >> not sure how much time I want to spend investigating right now.
>> >> >> >> >> >>
>> >> >> >> >> >> Explicitly splitting all of the vectorize macros the way that this patch
>> >> >> >> >> >> does gets everything working again, but I have a feeling that we're
>> >> >> >> >> >> papering over a bug somewhere, hence the RFC subject.
>> >> >> >> >> >>
>> >> >> >> >> >> I’m not sure what’s going on here, and it’s only the nextafter/sign
>> >> >> >> >> >> functions that regressed. This patch fixes the test results in piglit.
>> >> >> >> >> >>
>> >> >> >> >> >> No significant change on number of instructions in bitcode
>> >> >> >> >> >> for nextafter float16 (2-3 instructions savings over ~350 lines).
>> >> >> >> >> >>
>> >> >> >> >> >
>> >> >> >> >> > Were you testing this on Evergreen?  The sign() test passes for me on
>> >> >> >> >> > SI without this patch.
>> >> >> >> >> >
>> >> >> >> >>
>> >> >> >> >> I don't have my EG card available at the moment, but I *think* that
>> >> >> >> >> passed.  All I've got available at the moment is my pitcairn (si). For
>> >> >> >> >> me on pitcairn, I get failures in
>> >> >> >> >> piglit/generated_tests/cl/builtin/math/builtin-float-nextafter-1.0.generated.cl
>> >> >> >> >> for only the float16 type (about half of them pass, half don't).
>> >> >> >> >>
>> >> >> >> >
>> >> >> >> > This test passes for me too.  How current is your install of LLVM, I think there
>> >> >> >> > were some shufflevector fixes/improvements recently that may have fixed this test.
>> >> >> >> >
>> >> >> >> > -Tom
>> >> >> >> >
>> >> >> >> >> What happens if you test with [1] and [2] applied (enable vstore
>> >> >> >> >> optimizations and then add float to those optimizations), but without
>> >> >> >> >> the clcmacro refactoring patch [3]?
>> >> >> >> >>
>> >> >> >> >> [1] http://www.pcc.me.uk/pipermail/libclc-dev/2014-July/000461.html
>> >> >> >> >> [2] http://www.pcc.me.uk/pipermail/libclc-dev/2014-July/000462.html
>> >> >> >> >> [3] http://www.pcc.me.uk/pipermail/libclc-dev/2014-July/000460.html
>> >> >> >> >>
>> >> >> >> >> --Aaron
>> >> >> >> >>
>> >> >> >> >> > -Tom
>> >> >> >> >> >
>> >> >> >> >> >> Signed-off-by: Aaron Watry <awatry at gmail.com>
>> >> >> >> >> >> ---
>> >> >> >> >> >>  generic/lib/clcmacro.h | 84 +++++++++++++++++++++++++++++++++++++++++++-------
>> >> >> >> >> >>  1 file changed, 73 insertions(+), 11 deletions(-)
>> >> >> >> >> >>
>> >> >> >> >> >> diff --git a/generic/lib/clcmacro.h b/generic/lib/clcmacro.h
>> >> >> >> >> >> index 730073a..64b1770 100644
>> >> >> >> >> >> --- a/generic/lib/clcmacro.h
>> >> >> >> >> >> +++ b/generic/lib/clcmacro.h
>> >> >> >> >> >> @@ -1,44 +1,106 @@
>> >> >> >> >> >>  #define _CLC_UNARY_VECTORIZE(DECLSPEC, RET_TYPE, FUNCTION, ARG1_TYPE) \
>> >> >> >> >> >>    DECLSPEC RET_TYPE##2 FUNCTION(ARG1_TYPE##2 x) { \
>> >> >> >> >> >> -    return (RET_TYPE##2)(FUNCTION(x.x), FUNCTION(x.y)); \
>> >> >> >> >> >> +    return (RET_TYPE##2){FUNCTION(x.s0), FUNCTION(x.s1)}; \
>> >> >> >> >> >>    } \
>> >> >> >> >> >>  \
>> >> >> >> >> >>    DECLSPEC RET_TYPE##3 FUNCTION(ARG1_TYPE##3 x) { \
>> >> >> >> >> >> -    return (RET_TYPE##3)(FUNCTION(x.x), FUNCTION(x.y), FUNCTION(x.z)); \
>> >> >> >> >> >> +    return (RET_TYPE##3){FUNCTION(x.s0), FUNCTION(x.s1), FUNCTION(x.s2)}; \
>> >> >> >> >> >>    } \
>> >> >> >> >> >>  \
>> >> >> >> >> >>    DECLSPEC RET_TYPE##4 FUNCTION(ARG1_TYPE##4 x) { \
>> >> >> >> >> >> -    return (RET_TYPE##4)(FUNCTION(x.lo), FUNCTION(x.hi)); \
>> >> >> >> >> >> +    return (RET_TYPE##4){ \
>> >> >> >> >> >> +      FUNCTION(x.s0), \
>> >> >> >> >> >> +      FUNCTION(x.s1), \
>> >> >> >> >> >> +      FUNCTION(x.s2), \
>> >> >> >> >> >> +      FUNCTION(x.s3), \
>> >> >> >> >> >> +    }; \
>> >> >> >> >> >>    } \
>> >> >> >> >> >>  \
>> >> >> >> >> >>    DECLSPEC RET_TYPE##8 FUNCTION(ARG1_TYPE##8 x) { \
>> >> >> >> >> >> -    return (RET_TYPE##8)(FUNCTION(x.lo), FUNCTION(x.hi)); \
>> >> >> >> >> >> +    return (RET_TYPE##8){ \
>> >> >> >> >> >> +      FUNCTION(x.s0), \
>> >> >> >> >> >> +      FUNCTION(x.s1), \
>> >> >> >> >> >> +      FUNCTION(x.s2), \
>> >> >> >> >> >> +      FUNCTION(x.s3), \
>> >> >> >> >> >> +      FUNCTION(x.s4), \
>> >> >> >> >> >> +      FUNCTION(x.s5), \
>> >> >> >> >> >> +      FUNCTION(x.s6), \
>> >> >> >> >> >> +      FUNCTION(x.s7), \
>> >> >> >> >> >> +    }; \
>> >> >> >> >> >>    } \
>> >> >> >> >> >>  \
>> >> >> >> >> >>    DECLSPEC RET_TYPE##16 FUNCTION(ARG1_TYPE##16 x) { \
>> >> >> >> >> >> -    return (RET_TYPE##16)(FUNCTION(x.lo), FUNCTION(x.hi)); \
>> >> >> >> >> >> +    return (RET_TYPE##16){ \
>> >> >> >> >> >> +      FUNCTION(x.s0), \
>> >> >> >> >> >> +      FUNCTION(x.s1), \
>> >> >> >> >> >> +      FUNCTION(x.s2), \
>> >> >> >> >> >> +      FUNCTION(x.s3), \
>> >> >> >> >> >> +      FUNCTION(x.s4), \
>> >> >> >> >> >> +      FUNCTION(x.s5), \
>> >> >> >> >> >> +      FUNCTION(x.s6), \
>> >> >> >> >> >> +      FUNCTION(x.s7), \
>> >> >> >> >> >> +      FUNCTION(x.s8), \
>> >> >> >> >> >> +      FUNCTION(x.s9), \
>> >> >> >> >> >> +      FUNCTION(x.sa), \
>> >> >> >> >> >> +      FUNCTION(x.sb), \
>> >> >> >> >> >> +      FUNCTION(x.sc), \
>> >> >> >> >> >> +      FUNCTION(x.sd), \
>> >> >> >> >> >> +      FUNCTION(x.se), \
>> >> >> >> >> >> +      FUNCTION(x.sf) \
>> >> >> >> >> >> +    }; \
>> >> >> >> >> >>    }
>> >> >> >> >> >>
>> >> >> >> >> >>  #define _CLC_BINARY_VECTORIZE(DECLSPEC, RET_TYPE, FUNCTION, ARG1_TYPE, ARG2_TYPE) \
>> >> >> >> >> >>    DECLSPEC RET_TYPE##2 FUNCTION(ARG1_TYPE##2 x, ARG2_TYPE##2 y) { \
>> >> >> >> >> >> -    return (RET_TYPE##2)(FUNCTION(x.x, y.x), FUNCTION(x.y, y.y)); \
>> >> >> >> >> >> +    return (RET_TYPE##2){FUNCTION(x.s0, y.s0), FUNCTION(x.s1, y.s1)}; \
>> >> >> >> >> >>    } \
>> >> >> >> >> >>  \
>> >> >> >> >> >>    DECLSPEC RET_TYPE##3 FUNCTION(ARG1_TYPE##3 x, ARG2_TYPE##3 y) { \
>> >> >> >> >> >> -    return (RET_TYPE##3)(FUNCTION(x.x, y.x), FUNCTION(x.y, y.y), \
>> >> >> >> >> >> -                         FUNCTION(x.z, y.z)); \
>> >> >> >> >> >> +    return (RET_TYPE##3){FUNCTION(x.s0, y.s0), FUNCTION(x.s1, y.s1), \
>> >> >> >> >> >> +                         FUNCTION(x.s2, y.s2)}; \
>> >> >> >> >> >>    } \
>> >> >> >> >> >>  \
>> >> >> >> >> >>    DECLSPEC RET_TYPE##4 FUNCTION(ARG1_TYPE##4 x, ARG2_TYPE##4 y) { \
>> >> >> >> >> >> -    return (RET_TYPE##4)(FUNCTION(x.lo, y.lo), FUNCTION(x.hi, y.hi)); \
>> >> >> >> >> >> +    return (RET_TYPE##4){ \
>> >> >> >> >> >> +      FUNCTION(x.s0, y.s0), \
>> >> >> >> >> >> +      FUNCTION(x.s1, y.s1), \
>> >> >> >> >> >> +      FUNCTION(x.s2, y.s2), \
>> >> >> >> >> >> +      FUNCTION(x.s3, y.s3), \
>> >> >> >> >> >> +    }; \
>> >> >> >> >> >>    } \
>> >> >> >> >> >>  \
>> >> >> >> >> >>    DECLSPEC RET_TYPE##8 FUNCTION(ARG1_TYPE##8 x, ARG2_TYPE##8 y) { \
>> >> >> >> >> >> -    return (RET_TYPE##8)(FUNCTION(x.lo, y.lo), FUNCTION(x.hi, y.hi)); \
>> >> >> >> >> >> +    return (RET_TYPE##8){ \
>> >> >> >> >> >> +      FUNCTION(x.s0, y.s0), \
>> >> >> >> >> >> +      FUNCTION(x.s1, y.s1), \
>> >> >> >> >> >> +      FUNCTION(x.s2, y.s2), \
>> >> >> >> >> >> +      FUNCTION(x.s3, y.s3), \
>> >> >> >> >> >> +      FUNCTION(x.s4, y.s4), \
>> >> >> >> >> >> +      FUNCTION(x.s5, y.s5), \
>> >> >> >> >> >> +      FUNCTION(x.s6, y.s6), \
>> >> >> >> >> >> +      FUNCTION(x.s7, y.s7), \
>> >> >> >> >> >> +    }; \
>> >> >> >> >> >>    } \
>> >> >> >> >> >>  \
>> >> >> >> >> >>    DECLSPEC RET_TYPE##16 FUNCTION(ARG1_TYPE##16 x, ARG2_TYPE##16 y) { \
>> >> >> >> >> >> -    return (RET_TYPE##16)(FUNCTION(x.lo, y.lo), FUNCTION(x.hi, y.hi)); \
>> >> >> >> >> >> +    return (RET_TYPE##16){ \
>> >> >> >> >> >> +      FUNCTION(x.s0, y.s0), \
>> >> >> >> >> >> +      FUNCTION(x.s1, y.s1), \
>> >> >> >> >> >> +      FUNCTION(x.s2, y.s2), \
>> >> >> >> >> >> +      FUNCTION(x.s3, y.s3), \
>> >> >> >> >> >> +      FUNCTION(x.s4, y.s4), \
>> >> >> >> >> >> +      FUNCTION(x.s5, y.s5), \
>> >> >> >> >> >> +      FUNCTION(x.s6, y.s6), \
>> >> >> >> >> >> +      FUNCTION(x.s7, y.s7), \
>> >> >> >> >> >> +      FUNCTION(x.s8, y.s8), \
>> >> >> >> >> >> +      FUNCTION(x.s9, y.s9), \
>> >> >> >> >> >> +      FUNCTION(x.sa, y.sa), \
>> >> >> >> >> >> +      FUNCTION(x.sb, y.sb), \
>> >> >> >> >> >> +      FUNCTION(x.sc, y.sc), \
>> >> >> >> >> >> +      FUNCTION(x.sd, y.sd), \
>> >> >> >> >> >> +      FUNCTION(x.se, y.se), \
>> >> >> >> >> >> +      FUNCTION(x.sf, y.sf) \
>> >> >> >> >> >> +    }; \
>> >> >> >> >> >>    }
>> >> >> >> >> >>
>> >> >> >> >> >>  #define _CLC_DEFINE_BINARY_BUILTIN(RET_TYPE, FUNCTION, BUILTIN, ARG1_TYPE, ARG2_TYPE) \
>> >> >> >> >> >> --
>> >> >> >> >> >> 1.9.1
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> _______________________________________________
>> >> >> >> >> >> Libclc-dev mailing list
>> >> >> >> >> >> Libclc-dev at pcc.me.uk
>> >> >> >> >> >> http://www.pcc.me.uk/cgi-bin/mailman/listinfo/libclc-dev




More information about the Libclc-dev mailing list