[Libclc-dev] [PATCH v2 3/3] Implement vload_half{, n} and vload(half)

Jan Vesely via Libclc-dev libclc-dev at lists.llvm.org
Thu Sep 14 12:51:52 PDT 2017


On Thu, 2017-09-14 at 09:28 -0500, Aaron Watry via Libclc-dev wrote:
> Just curious,
> 
> Would you be interested in implementing vloada_halfN and vstorea_halfN
> as well?  I was just running through some CTS tests, and it failed on
> an implicit declaration of those functions.

I'd prefer to have piglit tests before I start libclc implementation.
so it might take some time before I have enough free cycles.

> 
> I believe that the only real difference is that vloada_halfN requires
> that the pointers be aligned to sizeof(halfN) instead of 16-bit
> alignment, with 64-bit alignment for half3's (4*sizeof(half)).

I'd make sense to add all vloada/vstorea at the same time. we only need
to carte about type3 alignment. llvm alread coalesces loads/stores if
it can prove that the pointer is aligned (we can add llvm.assume to
help this)

> 
> If not, I can take a look at this when I've got time (a.k.a. could be
> tomorrow, could be next year).

don't let me stop you from working on this. it should be a
straightforward modification of the existing vload/vstore
implementation.

Jan

> 
> --Aaron
> 
> On Thu, Sep 7, 2017 at 2:47 PM, Jan Vesely via Libclc-dev
> <libclc-dev at lists.llvm.org> wrote:
> > On Fri, 2017-09-01 at 20:31 -0400, Jan Vesely wrote:
> > > v2: add vload(half) as well
> > >     make helpers amdgpu specific (NVPTX uses different private AS numbering)
> > >     use clang builtin on clang >= 6
> > > 
> > > Signed-off-by: Jan Vesely <jan.vesely at rutgers.edu>
> > > ---
> > > needs [PATCH 1/3] Add halfN types and enable fp16 when generating
> > > builtin declarations from Aaron
> > > clang builtins require https://reviews.llvm.org/D37231
> > 
> > just FYI, the clang patch is now merged as r312742.
> > 
> > Jan
> > 
> > > Passes vload_half piglits on turks (clang-6)
> > > Passes vload_half and vload(half) piglits on carrizo (clang-5)
> > > 
> > >  amdgpu/lib/SOURCES_4.0                  |  1 +
> > >  amdgpu/lib/SOURCES_5.0                  |  1 +
> > >  amdgpu/lib/shared/vload_half_helpers.ll | 23 +++++++++++++
> > >  generic/include/clc/shared/vload.h      | 55 +++++++++++++++++++-----------
> > >  generic/lib/shared/vload.cl             | 59 +++++++++++++++++++++++++++++++++
> > >  generic/lib/shared/vload_half.inc       | 13 ++++++++
> > >  6 files changed, 132 insertions(+), 20 deletions(-)
> > >  create mode 100644 amdgpu/lib/shared/vload_half_helpers.ll
> > >  create mode 100644 generic/lib/shared/vload_half.inc
> > > 
> > > diff --git a/amdgpu/lib/SOURCES_4.0 b/amdgpu/lib/SOURCES_4.0
> > > index 5851798..69c5e5c 100644
> > > --- a/amdgpu/lib/SOURCES_4.0
> > > +++ b/amdgpu/lib/SOURCES_4.0
> > > @@ -1 +1,2 @@
> > > +shared/vload_half_helpers.ll
> > >  shared/vstore_half_helpers.ll
> > > diff --git a/amdgpu/lib/SOURCES_5.0 b/amdgpu/lib/SOURCES_5.0
> > > index 5851798..69c5e5c 100644
> > > --- a/amdgpu/lib/SOURCES_5.0
> > > +++ b/amdgpu/lib/SOURCES_5.0
> > > @@ -1 +1,2 @@
> > > +shared/vload_half_helpers.ll
> > >  shared/vstore_half_helpers.ll
> > > diff --git a/amdgpu/lib/shared/vload_half_helpers.ll b/amdgpu/lib/shared/vload_half_helpers.ll
> > > new file mode 100644
> > > index 0000000..b8c905a
> > > --- /dev/null
> > > +++ b/amdgpu/lib/shared/vload_half_helpers.ll
> > > @@ -0,0 +1,23 @@
> > > +define float @__clc_vload_half_float_helper__private(half addrspace(0)* nocapture %ptr) nounwind alwaysinline {
> > > +  %data = load half, half addrspace(0)* %ptr
> > > +  %res = fpext half %data to float
> > > +  ret float %res
> > > +}
> > > +
> > > +define float @__clc_vload_half_float_helper__global(half addrspace(1)* nocapture %ptr) nounwind alwaysinline {
> > > +  %data = load half, half addrspace(1)* %ptr
> > > +  %res = fpext half %data to float
> > > +  ret float %res
> > > +}
> > > +
> > > +define float @__clc_vload_half_float_helper__local(half addrspace(3)* nocapture %ptr) nounwind alwaysinline {
> > > +  %data = load half, half addrspace(3)* %ptr
> > > +  %res = fpext half %data to float
> > > +  ret float %res
> > > +}
> > > +
> > > +define float @__clc_vload_half_float_helper__constant(half addrspace(2)* nocapture %ptr) nounwind alwaysinline {
> > > +  %data = load half, half addrspace(2)* %ptr
> > > +  %res = fpext half %data to float
> > > +  ret float %res
> > > +}
> > > diff --git a/generic/include/clc/shared/vload.h b/generic/include/clc/shared/vload.h
> > > index 93d0750..8c262dd 100644
> > > --- a/generic/include/clc/shared/vload.h
> > > +++ b/generic/include/clc/shared/vload.h
> > > @@ -1,18 +1,21 @@
> > > -#define _CLC_VLOAD_DECL(PRIM_TYPE, VEC_TYPE, WIDTH, ADDR_SPACE) \
> > > -  _CLC_OVERLOAD _CLC_DECL VEC_TYPE vload##WIDTH(size_t offset, const ADDR_SPACE PRIM_TYPE *x);
> > > +#define _CLC_VLOAD_DECL(SUFFIX, MEM_TYPE, VEC_TYPE, WIDTH, ADDR_SPACE) \
> > > +  _CLC_OVERLOAD _CLC_DECL VEC_TYPE vload##SUFFIX##WIDTH(size_t offset, const ADDR_SPACE MEM_TYPE *x);
> > > 
> > > -#define _CLC_VECTOR_VLOAD_DECL(PRIM_TYPE, ADDR_SPACE) \
> > > -  _CLC_VLOAD_DECL(PRIM_TYPE, PRIM_TYPE##2, 2, ADDR_SPACE) \
> > > -  _CLC_VLOAD_DECL(PRIM_TYPE, PRIM_TYPE##3, 3, ADDR_SPACE) \
> > > -  _CLC_VLOAD_DECL(PRIM_TYPE, PRIM_TYPE##4, 4, ADDR_SPACE) \
> > > -  _CLC_VLOAD_DECL(PRIM_TYPE, PRIM_TYPE##8, 8, ADDR_SPACE) \
> > > -  _CLC_VLOAD_DECL(PRIM_TYPE, PRIM_TYPE##16, 16, ADDR_SPACE)
> > > +#define _CLC_VECTOR_VLOAD_DECL(SUFFIX, MEM_TYPE, PRIM_TYPE, ADDR_SPACE) \
> > > +  _CLC_VLOAD_DECL(SUFFIX, MEM_TYPE, PRIM_TYPE##2, 2, ADDR_SPACE) \
> > > +  _CLC_VLOAD_DECL(SUFFIX, MEM_TYPE, PRIM_TYPE##3, 3, ADDR_SPACE) \
> > > +  _CLC_VLOAD_DECL(SUFFIX, MEM_TYPE, PRIM_TYPE##4, 4, ADDR_SPACE) \
> > > +  _CLC_VLOAD_DECL(SUFFIX, MEM_TYPE, PRIM_TYPE##8, 8, ADDR_SPACE) \
> > > +  _CLC_VLOAD_DECL(SUFFIX, MEM_TYPE, PRIM_TYPE##16, 16, ADDR_SPACE)
> > > +
> > > +#define _CLC_VECTOR_VLOAD_PRIM3(SUFFIX, MEM_TYPE, PRIM_TYPE) \
> > > +  _CLC_VECTOR_VLOAD_DECL(SUFFIX, MEM_TYPE, PRIM_TYPE, __private) \
> > > +  _CLC_VECTOR_VLOAD_DECL(SUFFIX, MEM_TYPE, PRIM_TYPE, __local) \
> > > +  _CLC_VECTOR_VLOAD_DECL(SUFFIX, MEM_TYPE, PRIM_TYPE, __constant) \
> > > +  _CLC_VECTOR_VLOAD_DECL(SUFFIX, MEM_TYPE, PRIM_TYPE, __global) \
> > > 
> > >  #define _CLC_VECTOR_VLOAD_PRIM1(PRIM_TYPE) \
> > > -  _CLC_VECTOR_VLOAD_DECL(PRIM_TYPE, __private) \
> > > -  _CLC_VECTOR_VLOAD_DECL(PRIM_TYPE, __local) \
> > > -  _CLC_VECTOR_VLOAD_DECL(PRIM_TYPE, __constant) \
> > > -  _CLC_VECTOR_VLOAD_DECL(PRIM_TYPE, __global) \
> > > +  _CLC_VECTOR_VLOAD_PRIM3(, PRIM_TYPE, PRIM_TYPE) \
> > > 
> > >  #define _CLC_VECTOR_VLOAD_PRIM() \
> > >      _CLC_VECTOR_VLOAD_PRIM1(char) \
> > > @@ -24,14 +27,26 @@
> > >      _CLC_VECTOR_VLOAD_PRIM1(long) \
> > >      _CLC_VECTOR_VLOAD_PRIM1(ulong) \
> > >      _CLC_VECTOR_VLOAD_PRIM1(float) \
> > > -
> > > +    _CLC_VECTOR_VLOAD_PRIM3(_half, half, float)
> > > +
> > >  #ifdef cl_khr_fp64
> > > -#define _CLC_VECTOR_VLOAD() \
> > > -  _CLC_VECTOR_VLOAD_PRIM1(double) \
> > > -  _CLC_VECTOR_VLOAD_PRIM()
> > > -#else
> > > -#define _CLC_VECTOR_VLOAD() \
> > > -  _CLC_VECTOR_VLOAD_PRIM()
> > > +#pragma OPENCL EXTENSION cl_khr_fp64: enable
> > > +  _CLC_VECTOR_VLOAD_PRIM1(double)
> > >  #endif
> > > +#ifdef cl_khr_fp16
> > > +#pragma OPENCL EXTENSION cl_khr_fp16: enable
> > > +  _CLC_VECTOR_VLOAD_PRIM1(half)
> > > +#endif
> > > +
> > > +_CLC_VECTOR_VLOAD_PRIM()
> > > +// Plain vload_half also needs to be declared
> > > +_CLC_VLOAD_DECL(_half, half, float, , __constant)
> > > +_CLC_VLOAD_DECL(_half, half, float, , __global)
> > > +_CLC_VLOAD_DECL(_half, half, float, , __local)
> > > +_CLC_VLOAD_DECL(_half, half, float, , __private)
> > > 
> > > -_CLC_VECTOR_VLOAD()
> > > +#undef _CLC_VLOAD_DECL
> > > +#undef _CLC_VECTOR_VLOAD_DECL
> > > +#undef _CLC_VECTOR_VLOAD_PRIM3
> > > +#undef _CLC_VECTOR_VLOAD_PRIM1
> > > +#undef _CLC_VECTOR_VLOAD_PRIM
> > > diff --git a/generic/lib/shared/vload.cl b/generic/lib/shared/vload.cl
> > > index 8897200..0892270 100644
> > > --- a/generic/lib/shared/vload.cl
> > > +++ b/generic/lib/shared/vload.cl
> > > @@ -50,3 +50,62 @@ VLOAD_TYPES()
> > >  #pragma OPENCL EXTENSION cl_khr_fp64 : enable
> > >      VLOAD_ADDR_SPACES(double)
> > >  #endif
> > > +#ifdef cl_khr_fp16
> > > +#pragma OPENCL EXTENSION cl_khr_fp16 : enable
> > > +    VLOAD_ADDR_SPACES(half)
> > > +#endif
> > > +
> > > +/* vload_half are legal even without cl_khr_fp16 */
> > > +/* no vload_half for double */
> > > +#if __clang_major__ < 6
> > > +float __clc_vload_half_float_helper__constant(const __constant half *);
> > > +float __clc_vload_half_float_helper__global(const __global half *);
> > > +float __clc_vload_half_float_helper__local(const __local half *);
> > > +float __clc_vload_half_float_helper__private(const __private half *);
> > > +
> > > +#define VEC_LOAD1(val, AS) val = __clc_vload_half_float_helper##AS (&mem[offset++]);
> > > +#else
> > > +#define VEC_LOAD1(val, AS) val = __builtin_load_halff(&mem[offset++]);
> > > +#endif
> > > +
> > > +#define VEC_LOAD2(val, AS) \
> > > +     VEC_LOAD1(val.lo, AS) \
> > > +     VEC_LOAD1(val.hi, AS)
> > > +#define VEC_LOAD3(val, AS) \
> > > +     VEC_LOAD1(val.s0, AS) \
> > > +     VEC_LOAD1(val.s1, AS) \
> > > +     VEC_LOAD1(val.s2, AS)
> > > +#define VEC_LOAD4(val, AS) \
> > > +     VEC_LOAD2(val.lo, AS) \
> > > +     VEC_LOAD2(val.hi, AS)
> > > +#define VEC_LOAD8(val, AS) \
> > > +     VEC_LOAD4(val.lo, AS) \
> > > +     VEC_LOAD4(val.hi, AS)
> > > +#define VEC_LOAD16(val, AS) \
> > > +     VEC_LOAD8(val.lo, AS) \
> > > +     VEC_LOAD8(val.hi, AS)
> > > +
> > > +#define __FUNC(SUFFIX, VEC_SIZE, TYPE, AS) \
> > > +  _CLC_OVERLOAD _CLC_DEF TYPE vload_half##SUFFIX(size_t offset, const AS half *mem) { \
> > > +    offset *= VEC_SIZE; \
> > > +    TYPE __tmp; \
> > > +    VEC_LOAD##VEC_SIZE(__tmp, AS) \
> > > +    return __tmp; \
> > > +  }
> > > +
> > > +#define FUNC(SUFFIX, VEC_SIZE, TYPE, AS) __FUNC(SUFFIX, VEC_SIZE, TYPE, AS)
> > > +
> > > +#define __CLC_BODY "vload_half.inc"
> > > +#include <clc/math/gentype.inc>
> > > +#undef __CLC_BODY
> > > +#undef FUNC
> > > +#undef __FUNC
> > > +#undef VEC_LOAD16
> > > +#undef VEC_LOAD8
> > > +#undef VEC_LOAD4
> > > +#undef VEC_LOAD3
> > > +#undef VEC_LOAD2
> > > +#undef VEC_LOAD1
> > > +#undef VLOAD_TYPES
> > > +#undef VLOAD_ADDR_SPACES
> > > +#undef VLOAD_VECTORIZE
> > > diff --git a/generic/lib/shared/vload_half.inc b/generic/lib/shared/vload_half.inc
> > > new file mode 100644
> > > index 0000000..00dae8a
> > > --- /dev/null
> > > +++ b/generic/lib/shared/vload_half.inc
> > > @@ -0,0 +1,13 @@
> > > +#if __CLC_FPSIZE == 32
> > > +#ifdef __CLC_VECSIZE
> > > +  FUNC(__CLC_VECSIZE, __CLC_VECSIZE, __CLC_GENTYPE, __private);
> > > +  FUNC(__CLC_VECSIZE, __CLC_VECSIZE, __CLC_GENTYPE, __local);
> > > +  FUNC(__CLC_VECSIZE, __CLC_VECSIZE, __CLC_GENTYPE, __global);
> > > +  FUNC(__CLC_VECSIZE, __CLC_VECSIZE, __CLC_GENTYPE, __constant);
> > > +#else
> > > +  FUNC(, 1, __CLC_GENTYPE, __private);
> > > +  FUNC(, 1, __CLC_GENTYPE, __local);
> > > +  FUNC(, 1, __CLC_GENTYPE, __global);
> > > +  FUNC(, 1, __CLC_GENTYPE, __constant);
> > > +#endif
> > > +#endif
> > 
> > _______________________________________________
> > Libclc-dev mailing list
> > Libclc-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
> > 
> 
> _______________________________________________
> Libclc-dev mailing list
> Libclc-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://lists.llvm.org/pipermail/libclc-dev/attachments/20170914/235b84ca/attachment.sig>


More information about the Libclc-dev mailing list