<html><head><meta http-equiv="Content-Type" content="text/html; charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Hi Aaron,<div class=""><br class=""></div><div class="">A bit late to the party, but with LLVM 5.0 the IR being generated for this (for at least NVPTX) is rather ugly. I see sequences like the following appear quite a few number of times:</div><div class=""><br class=""></div><div class=""> %1 = extractelement <4 x i32> %and.i, i32 0, !dbg !70<br class=""> %trunc.i = trunc i32 %1 to i2, !dbg !70<br class=""> switch i2 %trunc.i, label %__clc_get_el_int4_uint.exit34.i [<br class=""> i2 0, label %sw.bb.i.i<br class=""> i2 1, label %__clc_get_el_int4_uint.exit.i<br class=""> i2 -2, label %sw.bb2.i.i<br class=""> i2 -1, label %sw.bb3.i.i<br class=""> ], !dbg !70<br class=""><br class="">sw.bb.i.i: ; preds = %entry<br class=""> br label %__clc_get_el_int4_uint.exit.i, !dbg !70<br class=""><br class="">sw.bb2.i.i: ; preds = %entry<br class=""> br label %__clc_get_el_int4_uint.exit.i, !dbg !70<br class=""><br class="">sw.bb3.i.i: ; preds = %entry<br class=""> br label %__clc_get_el_int4_uint.exit.i, !dbg !70<br class=""><br class="">__clc_get_el_int4_uint.exit34.i: ; preds = %entry<br class=""> unreachable, !dbg !70<br class=""><br class="">__clc_get_el_int4_uint.exit.i: ; preds = %sw.bb3.i.i, %sw.bb2.i.i, %sw.bb.i.i, %entry<br class=""><div><br class=""></div><div>This seems to be a missed optimisation in the SimplifyCFG pass. Is this a known issue?</div><div><br class=""></div><div>Thanks,</div><div><br class=""></div><div> Jeroen</div><div><br class=""><blockquote type="cite" class=""><div class="">On 1 Sep 2017, at 21:55, Jan Vesely via Libclc-dev <<a href="mailto:libclc-dev@lists.llvm.org" class="">libclc-dev@lists.llvm.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">On Fri, 2017-09-01 at 14:42 -0500, Aaron Watry wrote:</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;" class="">On Fri, Sep 1, 2017 at 2:21 PM, Aaron Watry <<a href="mailto:awatry@gmail.com" class="">awatry@gmail.com</a>> wrote:<br class=""><blockquote type="cite" class="">On Thu, Aug 31, 2017 at 5:14 PM, Jan Vesely <<a href="mailto:jan.vesely@rutgers.edu" class="">jan.vesely@rutgers.edu</a>> wrote:<br class=""><blockquote type="cite" class="">On Thu, 2017-08-31 at 11:59 -0400, Jan Vesely wrote:<br class=""><blockquote type="cite" class="">On Sun, 2017-06-11 at 22:04 -0500, Aaron Watry via Libclc-dev wrote:<br class=""><blockquote type="cite" class="">This was added in CL 1.1<br class=""><br class="">Tested with a Radeon HD 7850 (Pitcairn) using the CL CTS via:<br class="">test_conformance/relationals/test_relationals shuffle_built_in<br class=""></blockquote><br class="">sorry it took so long. I think there are still parts missing but we<br class="">might be able to get away with it if clang can handle mask type<br class="">conversion implicitly.<br class="">it also needs to ignore the high bits of mask elements.<br class="">see inline comments.<br class=""></blockquote><br class="">looks like I was wrong on both accounts.<br class=""></blockquote><br class="">Yeah, I did this with an 'mask &= (MASKTYPE##N)(ARGSIZE-1)' as the<br class="">first piece of the function implementation, instead of in the switch<br class="">statement. I did test this using the CTS, which handled float/double,<br class="">but doesn't have fp16 tests. I'll see what I need to do to get this<br class="">working on my SI using your piglit tests (assuming that I can).<br class=""><br class="">I'll also go ahead and move this to a new misc/ directory as<br class="">suggested. If I can manage to test the fp16 support myself, do you<br class="">want to see the new version,or would the review stand, assuming that<br class="">all I have to add is:<br class=""><br class="">#ifdef cl_khr_fp16<br class="">#pragma OPENCL EXTENSION cl_khr_fp16 : enable<br class="">_CLC_VECTOR_SHUFFLE_INSIZE(half, ushort)<br class="">#endif<br class=""></blockquote></blockquote><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">afaik this should be enough wrt to shuffle.</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;" class=""><blockquote type="cite" class=""><br class="">--Aaron<br class=""></blockquote><br class="">Note: I had to also add the half type definitions in<br class="">float/definitions.h.<span class="Apple-converted-space"> </span><br class=""></blockquote><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">hm, I assumed we already had some half precision support in libclc.</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">If it means that you need to add all infrastructure, feel free to skip</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">this for shuffle{,2}. Sorry, should have checked it.</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;" class="">I'll re-send this piece as 1 patch, and then v2<br class="">of the two shuffle patches a parts 2/3, just so you can verify the<br class="">type definitions. half-precision is not something I've even started<br class="">to look into yet, but it seems like something you've possibly got a<br class="">use for, and possibly an ability to test (I believe that VI was the<br class="">first generation with fp16 support).<br class=""></blockquote><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">clang enables cl_khr_fp16 on everything >=gfx6 (which I believe is SI),</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">so you should be able to test it. I think you're correct that VI added</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">fp16 instructions, but they're not strictly required.</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">We don't expose any of this in clover however. so you'll either need to</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">disable the piglit requirement or modify clover.</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">I can test on carrizo if you post the patches that include fp16.</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">Jan</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;" class=""><br class="">--Aaron<br class=""><br class=""><blockquote type="cite" class=""><br class=""><blockquote type="cite" class="">If you add half version and move this to clc/misc:<br class="">Reviewed-by: Jan Vesely <<a href="mailto:jan.vesely@rutgers.edu" class="">jan.vesely@rutgers.edu</a>><br class=""><br class=""><blockquote type="cite" class=""><br class="">I haven't tested it yet. I'll try to do that and provide shuffle2<br class="">piglits asap.<br class=""></blockquote><br class="">works at least on carrizo/iceland with llvm 5.0<br class=""><br class="">Jan<br class=""><br class=""><blockquote type="cite" class=""><br class=""><blockquote type="cite" class=""><br class="">Signed-off-by: Aaron Watry <<a href="mailto:awatry@gmail.com" class="">awatry@gmail.com</a>><br class="">---<br class="">generic/include/clc/clc.h | 1 +<br class="">generic/include/clc/relational/shuffle.h | 44 +++++++++<br class="">generic/lib/SOURCES | 1 +<br class="">generic/lib/relational/shuffle.cl | 153 +++++++++++++++++++++++++++++++<br class="">4 files changed, 199 insertions(+)<br class="">create mode 100644 generic/include/clc/relational/shuffle.h<br class="">create mode 100644 generic/lib/relational/shuffle.cl<br class=""><br class="">diff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h<br class="">index 4c29214..ac1dab5 100644<br class="">--- a/generic/include/clc/clc.h<br class="">+++ b/generic/include/clc/clc.h<br class="">@@ -173,6 +173,7 @@<br class="">#include <clc/relational/isordered.h><br class="">#include <clc/relational/isunordered.h><br class="">#include <clc/relational/select.h><br class="">+#include <clc/relational/shuffle.h><br class=""></blockquote><br class="">Not sure why CTS puts these in relational category. specs have a misc<br class="">chapter for them, so it'd be nice to add new dir in clc.<br class=""><br class=""><blockquote type="cite" class="">#include <clc/relational/signbit.h><br class=""><br class="">/* 6.11.8 Synchronization Functions */<br class="">diff --git a/generic/include/clc/relational/shuffle.h b/generic/include/clc/relational/shuffle.h<br class="">new file mode 100644<br class="">index 0000000..e10ac5e<br class="">--- /dev/null<br class="">+++ b/generic/include/clc/relational/shuffle.h<br class="">@@ -0,0 +1,44 @@<br class="">+//===-- generic/include/clc/relational/shuffle.h ------------------------------===//<br class="">+//<br class="">+// The LLVM Compiler Infrastructure<br class="">+//<br class="">+// This file is dual licensed under both the University of Illinois Open Source<br class="">+// License and the MIT license. See LICENSE.TXT for details.<br class="">+//<br class="">+//===----------------------------------------------------------------------===//<br class="">+<br class="">+#define _CLC_SHUFFLE_DECL(TYPE, MASKTYPE, RETTYPE) \<br class="">+ _CLC_OVERLOAD _CLC_DECL RETTYPE shuffle(TYPE x, MASKTYPE mask);<br class="">+<br class="">+//Return type is same base type as the input type, with the same vector size as the mask.<br class="">+//Elements in the mask must be the same size (number of bits) as the input value.<br class="">+//E.g. char8 ret = shuffle(char2 x, uchar8 mask);<br class="">+<br class="">+#define _CLC_VECTOR_SHUFFLE_MASKSIZE(INBASE, INTYPE, MASKTYPE) \<br class="">+ _CLC_SHUFFLE_DECL(INTYPE, MASKTYPE##2, INBASE##2) \<br class="">+ _CLC_SHUFFLE_DECL(INTYPE, MASKTYPE##4, INBASE##4) \<br class="">+ _CLC_SHUFFLE_DECL(INTYPE, MASKTYPE##8, INBASE##8) \<br class="">+ _CLC_SHUFFLE_DECL(INTYPE, MASKTYPE##16, INBASE##16) \<br class="">+<br class="">+#define _CLC_VECTOR_SHUFFLE_INSIZE(TYPE, MASKTYPE) \<br class="">+ _CLC_VECTOR_SHUFFLE_MASKSIZE(TYPE, TYPE##2, MASKTYPE) \<br class="">+ _CLC_VECTOR_SHUFFLE_MASKSIZE(TYPE, TYPE##4, MASKTYPE) \<br class="">+ _CLC_VECTOR_SHUFFLE_MASKSIZE(TYPE, TYPE##8, MASKTYPE) \<br class="">+ _CLC_VECTOR_SHUFFLE_MASKSIZE(TYPE, TYPE##16, MASKTYPE) \<br class="">+<br class="">+_CLC_VECTOR_SHUFFLE_INSIZE(char, uchar)<br class="">+_CLC_VECTOR_SHUFFLE_INSIZE(short, ushort)<br class="">+_CLC_VECTOR_SHUFFLE_INSIZE(int, uint)<br class="">+_CLC_VECTOR_SHUFFLE_INSIZE(long, ulong)<br class="">+_CLC_VECTOR_SHUFFLE_INSIZE(uchar, uchar)<br class="">+_CLC_VECTOR_SHUFFLE_INSIZE(ushort, ushort)<br class="">+_CLC_VECTOR_SHUFFLE_INSIZE(uint, uint)<br class="">+_CLC_VECTOR_SHUFFLE_INSIZE(ulong, ulong)<br class="">+_CLC_VECTOR_SHUFFLE_INSIZE(float, uint)<br class="">+#ifdef cl_khr_fp64<br class="">+_CLC_VECTOR_SHUFFLE_INSIZE(double, ulong)<br class="">+#endif<br class="">+<br class="">+#undef _CLC_SHUFFLE_DECL<br class="">+#undef _CLC_VECTOR_SHUFFLE_MASKSIZE<br class="">+#undef _CLC_VECTOR_SHUFFLE_INSIZE<br class="">diff --git a/generic/lib/SOURCES b/generic/lib/SOURCES<br class="">index 9e0157b..fe0df5a 100644<br class="">--- a/generic/lib/SOURCES<br class="">+++ b/generic/lib/SOURCES<br class="">@@ -139,6 +139,7 @@ relational/isnormal.cl<br class="">relational/isnotequal.cl<br class="">relational/isordered.cl<br class="">relational/isunordered.cl<br class="">+relational/shuffle.cl<br class="">relational/signbit.cl<br class="">shared/clamp.cl<br class="">shared/max.cl<br class="">diff --git a/generic/lib/relational/shuffle.cl b/generic/lib/relational/shuffle.cl<br class="">new file mode 100644<br class="">index 0000000..7d96f86<br class="">--- /dev/null<br class="">+++ b/generic/lib/relational/shuffle.cl<br class="">@@ -0,0 +1,153 @@<br class="">+//===-- generic/lib/relational/shuffle.cl ------------------------------===//<br class="">+//<br class="">+// The LLVM Compiler Infrastructure<br class="">+//<br class="">+// This file is dual licensed under both the University of Illinois Open Source<br class="">+// License and the MIT license. See LICENSE.TXT for details.<br class="">+//<br class="">+//===----------------------------------------------------------------------===//<br class="">+<br class="">+#include <clc/clc.h><br class="">+<br class="">+#define _CLC_ELEMENT_CASES2(VAR) \<br class="">+ case 0: return VAR.s0; \<br class="">+ case 1: return VAR.s1;<br class="">+<br class="">+#define _CLC_ELEMENT_CASES4(VAR) \<br class="">+ _CLC_ELEMENT_CASES2(VAR) \<br class="">+ case 2: return VAR.s2; \<br class="">+ case 3: return VAR.s3;<br class="">+<br class="">+#define _CLC_ELEMENT_CASES8(VAR) \<br class="">+ _CLC_ELEMENT_CASES4(VAR) \<br class="">+ case 4: return VAR.s4; \<br class="">+ case 5: return VAR.s5; \<br class="">+ case 6: return VAR.s6; \<br class="">+ case 7: return VAR.s7;<br class="">+<br class="">+#define _CLC_ELEMENT_CASES16(VAR) \<br class="">+ _CLC_ELEMENT_CASES8(VAR) \<br class="">+ case 8: return VAR.s8; \<br class="">+ case 9: return VAR.s9; \<br class="">+ case 10: return VAR.sA; \<br class="">+ case 11: return VAR.sB; \<br class="">+ case 12: return VAR.sC; \<br class="">+ case 13: return VAR.sD; \<br class="">+ case 14: return VAR.sE; \<br class="">+ case 15: return VAR.sF;<br class="">+<br class="">+#define _CLC_GET_ELEMENT_DEFINE(ARGTYPE, ARGSIZE, IDXTYPE) \<br class="">+ inline ARGTYPE __clc_get_el_##ARGTYPE##ARGSIZE##_##IDXTYPE(ARGTYPE##ARGSIZE x, IDXTYPE idx) {\<br class="">+ switch (idx){ \<br class=""></blockquote><br class="">I think you need "idx % #ARGSIZE" here. Specs explitcilty mention that<br class="">higher bits are ignored and the newly posted piglit tests check this.<br class=""><br class=""><blockquote type="cite" class="">+ _CLC_ELEMENT_CASES##ARGSIZE(x) \<br class="">+ default: return 0; \<br class="">+ } \<br class="">+ } \<br class="">+<br class="">+#define _CLC_SHUFFLE_SET_ONE_ELEMENT(ARGTYPE, ARGSIZE, INDEX, MASKTYPE) \<br class="">+ ret_val.s##INDEX = __clc_get_el_##ARGTYPE##ARGSIZE##_##MASKTYPE(x, mask.s##INDEX); \<br class="">+<br class="">+#define _CLC_SHUFFLE_SET_2_ELEMENTS(ARGTYPE, ARGSIZE, MASKTYPE) \<br class="">+ ret_val.s0 = __clc_get_el_##ARGTYPE##ARGSIZE##_##MASKTYPE(x, mask.s0); \<br class="">+ ret_val.s1 = __clc_get_el_##ARGTYPE##ARGSIZE##_##MASKTYPE(x, mask.s1);<br class="">+<br class="">+#define _CLC_SHUFFLE_SET_4_ELEMENTS(ARGTYPE, ARGSIZE, MASKTYPE) \<br class="">+ _CLC_SHUFFLE_SET_2_ELEMENTS(ARGTYPE, ARGSIZE, MASKTYPE) \<br class="">+ ret_val.s2 = __clc_get_el_##ARGTYPE##ARGSIZE##_##MASKTYPE(x, mask.s2); \<br class="">+ ret_val.s3 = __clc_get_el_##ARGTYPE##ARGSIZE##_##MASKTYPE(x, mask.s3);<br class="">+<br class="">+#define _CLC_SHUFFLE_SET_8_ELEMENTS(ARGTYPE, ARGSIZE, MASKTYPE) \<br class="">+ _CLC_SHUFFLE_SET_4_ELEMENTS(ARGTYPE, ARGSIZE, MASKTYPE) \<br class="">+ ret_val.s4 = __clc_get_el_##ARGTYPE##ARGSIZE##_##MASKTYPE(x, mask.s4); \<br class="">+ ret_val.s5 = __clc_get_el_##ARGTYPE##ARGSIZE##_##MASKTYPE(x, mask.s5); \<br class="">+ ret_val.s6 = __clc_get_el_##ARGTYPE##ARGSIZE##_##MASKTYPE(x, mask.s6); \<br class="">+ ret_val.s7 = __clc_get_el_##ARGTYPE##ARGSIZE##_##MASKTYPE(x, mask.s7);<br class="">+<br class="">+#define _CLC_SHUFFLE_SET_16_ELEMENTS(ARGTYPE, ARGSIZE, MASKTYPE) \<br class="">+ _CLC_SHUFFLE_SET_8_ELEMENTS(ARGTYPE, ARGSIZE, MASKTYPE) \<br class="">+ ret_val.s8 = __clc_get_el_##ARGTYPE##ARGSIZE##_##MASKTYPE(x, mask.s8); \<br class="">+ ret_val.s9 = __clc_get_el_##ARGTYPE##ARGSIZE##_##MASKTYPE(x, mask.s9); \<br class="">+ ret_val.sA = __clc_get_el_##ARGTYPE##ARGSIZE##_##MASKTYPE(x, mask.sA); \<br class="">+ ret_val.sB = __clc_get_el_##ARGTYPE##ARGSIZE##_##MASKTYPE(x, mask.sB); \<br class="">+ ret_val.sC = __clc_get_el_##ARGTYPE##ARGSIZE##_##MASKTYPE(x, mask.sC); \<br class="">+ ret_val.sD = __clc_get_el_##ARGTYPE##ARGSIZE##_##MASKTYPE(x, mask.sD); \<br class="">+ ret_val.sE = __clc_get_el_##ARGTYPE##ARGSIZE##_##MASKTYPE(x, mask.sE); \<br class="">+ ret_val.sF = __clc_get_el_##ARGTYPE##ARGSIZE##_##MASKTYPE(x, mask.sF); \<br class="">+<br class="">+#define _CLC_SHUFFLE_DEFINE2(ARGTYPE, ARGSIZE, MASKTYPE) \<br class="">+_CLC_DEF _CLC_OVERLOAD ARGTYPE##2 shuffle(ARGTYPE##ARGSIZE x, MASKTYPE##2 mask){ \<br class="">+ ARGTYPE##2 ret_val; \<br class="">+ mask &= (MASKTYPE##2)(ARGSIZE-1); \<br class="">+ _CLC_SHUFFLE_SET_2_ELEMENTS(ARGTYPE, ARGSIZE, MASKTYPE) \<br class="">+ return ret_val; \<br class="">+}<br class="">+<br class="">+#define _CLC_SHUFFLE_DEFINE4(ARGTYPE, ARGSIZE, MASKTYPE) \<br class="">+_CLC_DEF _CLC_OVERLOAD ARGTYPE##4 shuffle(ARGTYPE##ARGSIZE x, MASKTYPE##4 mask){ \<br class="">+ ARGTYPE##4 ret_val; \<br class="">+ mask &= (MASKTYPE##4)(ARGSIZE-1); \<br class="">+ _CLC_SHUFFLE_SET_4_ELEMENTS(ARGTYPE, ARGSIZE, MASKTYPE) \<br class="">+ return ret_val; \<br class="">+}<br class="">+<br class="">+#define _CLC_SHUFFLE_DEFINE8(ARGTYPE, ARGSIZE, MASKTYPE) \<br class="">+_CLC_DEF _CLC_OVERLOAD ARGTYPE##8 shuffle(ARGTYPE##ARGSIZE x, MASKTYPE##8 mask){ \<br class="">+ ARGTYPE##8 ret_val; \<br class="">+ mask &= (MASKTYPE##8)(ARGSIZE-1); \<br class="">+ _CLC_SHUFFLE_SET_8_ELEMENTS(ARGTYPE, ARGSIZE, MASKTYPE) \<br class="">+ return ret_val; \<br class="">+}<br class="">+<br class="">+#define _CLC_SHUFFLE_DEFINE16(ARGTYPE, ARGSIZE, MASKTYPE) \<br class="">+_CLC_DEF _CLC_OVERLOAD ARGTYPE##16 shuffle(ARGTYPE##ARGSIZE x, MASKTYPE##16 mask){ \<br class="">+ ARGTYPE##16 ret_val; \<br class="">+ mask &= (MASKTYPE##16)(ARGSIZE-1); \<br class="">+ _CLC_SHUFFLE_SET_16_ELEMENTS(ARGTYPE, ARGSIZE, MASKTYPE) \<br class="">+ return ret_val; \<br class="">+}<br class="">+<br class="">+#define _CLC_VECTOR_SHUFFLE_MASKSIZE(INTYPE, ARGSIZE, MASKTYPE) \<br class="">+ _CLC_GET_ELEMENT_DEFINE(INTYPE, ARGSIZE, MASKTYPE) \<br class="">+ _CLC_SHUFFLE_DEFINE2(INTYPE, ARGSIZE, MASKTYPE) \<br class="">+ _CLC_SHUFFLE_DEFINE4(INTYPE, ARGSIZE, MASKTYPE) \<br class="">+ _CLC_SHUFFLE_DEFINE8(INTYPE, ARGSIZE, MASKTYPE) \<br class="">+ _CLC_SHUFFLE_DEFINE16(INTYPE, ARGSIZE, MASKTYPE) \<br class="">+<br class="">+#define _CLC_VECTOR_SHUFFLE_INSIZE(TYPE, MASKTYPE) \<br class="">+ _CLC_VECTOR_SHUFFLE_MASKSIZE(TYPE, 2, MASKTYPE) \<br class="">+ _CLC_VECTOR_SHUFFLE_MASKSIZE(TYPE, 4, MASKTYPE) \<br class="">+ _CLC_VECTOR_SHUFFLE_MASKSIZE(TYPE, 8, MASKTYPE) \<br class="">+ _CLC_VECTOR_SHUFFLE_MASKSIZE(TYPE, 16, MASKTYPE) \<br class="">+<br class="">+<br class="">+<br class="">+_CLC_VECTOR_SHUFFLE_INSIZE(char, uchar)<br class="">+_CLC_VECTOR_SHUFFLE_INSIZE(short, ushort)<br class="">+_CLC_VECTOR_SHUFFLE_INSIZE(int, uint)<br class="">+_CLC_VECTOR_SHUFFLE_INSIZE(long, ulong)<br class="">+_CLC_VECTOR_SHUFFLE_INSIZE(uchar, uchar)<br class="">+_CLC_VECTOR_SHUFFLE_INSIZE(ushort, ushort)<br class="">+_CLC_VECTOR_SHUFFLE_INSIZE(uint, uint)<br class="">+_CLC_VECTOR_SHUFFLE_INSIZE(ulong, ulong)<br class="">+_CLC_VECTOR_SHUFFLE_INSIZE(float, uint)<br class=""></blockquote><br class="">Mask type can be any vector of unsigned type, so I think you need all<br class="">combinations.<br class=""><br class=""><blockquote type="cite" class="">+#ifdef cl_khr_fp64<br class="">+#pragma OPENCL EXTENSION cl_khr_fp64 : enable<br class="">+_CLC_VECTOR_SHUFFLE_INSIZE(double, ulong)<br class=""></blockquote><br class="">I think this needs other mask types.<br class=""><br class=""><blockquote type="cite" class="">+#endif<br class=""></blockquote><br class="">add half/cl_khr_fp16 here.<br class=""><br class="">thanks,<br class="">Jan<br class=""><br class=""><blockquote type="cite" class="">+<br class="">+#undef _CLC_ELEMENT_CASES2<br class="">+#undef _CLC_ELEMENT_CASES4<br class="">+#undef _CLC_ELEMENT_CASES8<br class="">+#undef _CLC_ELEMENT_CASES16<br class="">+#undef _CLC_GET_ELEMENT_DEFINE<br class="">+#undef _CLC_SHUFFLE_SET_ONE_ELEMENT<br class="">+#undef _CLC_SHUFFLE_SET_2_ELEMENTS<br class="">+#undef _CLC_SHUFFLE_SET_4_ELEMENTS<br class="">+#undef _CLC_SHUFFLE_SET_8_ELEMENTS<br class="">+#undef _CLC_SHUFFLE_SET_16_ELEMENTS<br class="">+#undef _CLC_SHUFFLE_DEFINE2<br class="">+#undef _CLC_SHUFFLE_DEFINE4<br class="">+#undef _CLC_SHUFFLE_DEFINE8<br class="">+#undef _CLC_SHUFFLE_DEFINE16<br class="">+#undef _CLC_VECTOR_SHUFFLE_MASKSIZE<br class="">+#undef _CLC_VECTOR_SHUFFLE_INSIZE<br class=""></blockquote></blockquote></blockquote></blockquote></blockquote><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">_______________________________________________</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">Libclc-dev mailing list</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class=""><a href="mailto:Libclc-dev@lists.llvm.org" class="">Libclc-dev@lists.llvm.org</a></span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class=""><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev" class="">http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev</a></span></div></blockquote></div><br class=""></div></body></html>