[PATCH] [AArch64]Fix a bug about vset_lane_f16/vget_lane_f16 intrinsics caused by casting from float16_t to int16_t
Hao Liu
Hao.Liu at arm.com
Fri Feb 7 21:53:38 PST 2014
Hi Ana,
Your patch looks good to me and it works well.
I agree with you. If we think that builtins are function calls, we should
not pass float16_t type directly.
Could you please just commit your patch?
Thanks,
-Hao
> -----Original Message-----
> From: Ana Pazos [mailto:apazos at codeaurora.org]
> Sent: Friday, February 07, 2014 4:34 AM
> To: 'Hao Liu'
> Cc: t.p.northover at gmail.com; Hao Liu; cfe-commits at cs.uiuc.edu
> Subject: RE: [PATCH] [AArch64]Fix a bug about vset_lane_f16/vget_lane_f16
> intrinsics caused by casting from float16_t to int16_t
>
> Hello Hao,
>
> Please try the patch attached.
>
> I kept the macro implementation. From past discussion we learned that:
> - The ACLE intrinsic definition can have float16_t type because it is just
a
> notation.
> - But the builtins are function calls and float16_t is not permitted as
argument
> or return type.
>
> Thanks,
> Ana.
>
> -----Original Message-----
> From: Hao Liu [mailto:haoliuts at gmail.com]
> Sent: Thursday, January 23, 2014 6:27 PM
> To: Ana Pazos
> Cc: t.p.northover at gmail.com; Hao Liu; cfe-commits at cs.uiuc.edu
> Subject: Re: [PATCH] [AArch64]Fix a bug about vset_lane_f16/vget_lane_f16
> intrinsics caused by casting from float16_t to int16_t
>
> Hi Ana,
>
> The test case is:
> int test_vset_get_lane_f16(float16x8_t a) {
> float16x8_t b;
> b = vsetq_lane_f16(3.5, a, 5);
> float16_t c = vgetq_lane_f16(b, 5);
> return (int)c;
> // CHECK: movz x{{[0-9]+}}, #3
> }
>
> This test case firstly will set a lane to be 3.5, and then get that lane,
and
> return as a an integer. Currently we will return incorrect result an
integer 0
> under -O1. This is caused by type cast from float to integer before
> vset_lane_f16.
>
> Actually, the vsetq_lane_f16 in the test case is macro expanded as
> following:
> float16x8_t b;
> float16_t __a = 3.5;
> int16_t __a1= (int16_t) __a;
> int16x4_t __b1 = vreinterpret_s16_f16(b);
> int16x4_t __b2 = vset_lane_s16(__a1, __b1, __c);
> vreinterpret_f16_s16(__b2);
> ...
>
> The "int16_t __a1= (int16_t) __a;" indeed will do something. It will be
> translated into following IRs (see under -O0):
> %a = call float @llvm.convert.from.fp16(i16 %2)
> %conv = fptosi float %a to i16
> This fptosi IR will change the semantic. The float value of 3.5 will be
changed
> into int value 3. So the result bits are quite different from the original
float
> bits.
>
> Thanks,
> -Hao
>
> 2014/1/24 Ana Pazos <apazos at codeaurora.org>:
> > Hao,
> >
> > What is the failing test case?
> >
> > Isn't float16_t just a storage type? The casting operation will do
> > nothing
> in this case, there won't be a fp to integer conversion instruction being
> generated.
> >
> > We on purpose used macro to avoid the builtin call with a float16_t
type.
> >
> > We had this discussion in the past with Tim and he warned us that
> >
> > "ACLE itself says that __fp16 (and by extension float16_t) should
> > *not* be
> permitted as an argument or return type (section 4.1.2)."
> >
> > Can you send me the failing test case please.
> >
> > Thanks,
> > Ana.
> >
> > -----Original Message-----
> > From: cfe-commits-bounces at cs.uiuc.edu
> > [mailto:cfe-commits-bounces at cs.uiuc.edu] On Behalf Of Hao Liu
> > Sent: Thursday, January 23, 2014 2:25 AM
> > To: t.p.northover at gmail.com; Hao.Liu at arm.com
> > Cc: cfe-commits at cs.uiuc.edu
> > Subject: [PATCH] [AArch64]Fix a bug about vset_lane_f16/vget_lane_f16
> > intrinsics caused by casting from float16_t to int16_t
> >
> > Hi t.p.northover,
> >
> > Hi Tim and reviewers,
> >
> > Currently, vset_lane_f16/vget_lane_f16 are implemented by macro like:
> > #define vset_lane_f16(a, b, __c) ( { \
> > float16_t __a = (a); float16x4_t __b = (b); \
> > int16_t __a1 = (int16_t) __a; \
> > ...
> > There is a cast from float16_t to int16_t (from variable __a to __a1).
> Obviously, this well change the semantic. We just want bitwise convert,
not
> type cast from float to int. So if we set 3.5 to a vector lane and get
from that
> lane, we'll get incorrect result: 0.
> >
> > This patch fixes this problem by implementing such intrinsics by
> > builtin
> functions. The scalar type float16_t is passed directly without cast.
> >
> > Review, please.
> >
> > Thanks,
> > -Hao
> >
> > http://llvm-reviews.chandlerc.com/D2602
> >
> > Files:
> > include/clang/Basic/arm_neon.td
> > lib/CodeGen/CGBuiltin.cpp
> > test/CodeGen/aarch64-neon-copy.c
> > utils/TableGen/NeonEmitter.cpp
> >
> > Index: include/clang/Basic/arm_neon.td
> >
> ===============================================================
> ====
> > --- include/clang/Basic/arm_neon.td
> > +++ include/clang/Basic/arm_neon.td
> > @@ -866,9 +866,9 @@
> >
> > //////////////////////////////////////////////////////////////////////
> > ////////// // Extract or insert element from vector def GET_LANE :
> > IInst<"vget_lane", "sdi",
> > -
> "csilPcPsUcUsUiUlQcQsQiQlQUcQUsQUiQUlPcPsQPcQPsfdQfQdPlQPl">;
> > +
> > +
> "csilPcPsUcUsUiUlQcQsQiQlQUcQUsQUiQUlPcPsQPcQPshfdQhQfQdPlQPl">;
> > def SET_LANE : IInst<"vset_lane", "dsdi",
> > -
> "csilPcPsUcUsUiUlQcQsQiQlQUcQUsQUiQUlPcPsQPcQPsfdQfQdPlQPl">;
> > +
> > +
> "csilPcPsUcUsUiUlQcQsQiQlQUcQUsQUiQUlPcPsQPcQPshfdQhQfQdPlQPl">;
> > def COPY_LANE : IOpInst<"vcopy_lane", "ddidi",
> > "csilPcPsUcUsUiUlPcPsPlfd", OP_COPY_LN>; def
> > COPYQ_LANE : IOpInst<"vcopy_lane", "ddigi", @@ -1340,7 +1340,4 @@
> >
> > def SCALAR_VDUP_LANE : IInst<"vdup_lane", "sdi",
> > "ScSsSiSlSfSdSUcSUsSUiSUlSPcSPs">;
> > def SCALAR_VDUP_LANEQ : IInst<"vdup_laneq", "sji",
> > "ScSsSiSlSfSdSUcSUsSUiSUlSPcSPs">;
> > -
> > -def SCALAR_GET_LANE : IOpInst<"vget_lane", "sdi", "hQh",
> > OP_SCALAR_GET_LN>; -def SCALAR_SET_LANE : IOpInst<"vset_lane", "dsdi",
> > "hQh", OP_SCALAR_SET_LN>; }
> > Index: lib/CodeGen/CGBuiltin.cpp
> >
> ===============================================================
> ====
> > --- lib/CodeGen/CGBuiltin.cpp
> > +++ lib/CodeGen/CGBuiltin.cpp
> > @@ -1876,12 +1876,14 @@
> > case AArch64::BI__builtin_neon_vget_lane_i16:
> > case AArch64::BI__builtin_neon_vget_lane_i32:
> > case AArch64::BI__builtin_neon_vget_lane_i64:
> > + case AArch64::BI__builtin_neon_vget_lane_f16:
> > case AArch64::BI__builtin_neon_vget_lane_f32:
> > case AArch64::BI__builtin_neon_vget_lane_f64:
> > case AArch64::BI__builtin_neon_vgetq_lane_i8:
> > case AArch64::BI__builtin_neon_vgetq_lane_i16:
> > case AArch64::BI__builtin_neon_vgetq_lane_i32:
> > case AArch64::BI__builtin_neon_vgetq_lane_i64:
> > + case AArch64::BI__builtin_neon_vgetq_lane_f16:
> > case AArch64::BI__builtin_neon_vgetq_lane_f32:
> > case AArch64::BI__builtin_neon_vgetq_lane_f64:
> > return CGF.EmitARMBuiltinExpr(ARM::BI__builtin_neon_vget_lane_i8,
> > E);
> @@ -1889,12 +1891,14 @@
> > case AArch64::BI__builtin_neon_vset_lane_i16:
> > case AArch64::BI__builtin_neon_vset_lane_i32:
> > case AArch64::BI__builtin_neon_vset_lane_i64:
> > + case AArch64::BI__builtin_neon_vset_lane_f16:
> > case AArch64::BI__builtin_neon_vset_lane_f32:
> > case AArch64::BI__builtin_neon_vset_lane_f64:
> > case AArch64::BI__builtin_neon_vsetq_lane_i8:
> > case AArch64::BI__builtin_neon_vsetq_lane_i16:
> > case AArch64::BI__builtin_neon_vsetq_lane_i32:
> > case AArch64::BI__builtin_neon_vsetq_lane_i64:
> > + case AArch64::BI__builtin_neon_vsetq_lane_f16:
> > case AArch64::BI__builtin_neon_vsetq_lane_f32:
> > case AArch64::BI__builtin_neon_vsetq_lane_f64:
> > return CGF.EmitARMBuiltinExpr(ARM::BI__builtin_neon_vset_lane_i8,
> > E);
> > Index: test/CodeGen/aarch64-neon-copy.c
> >
> ===============================================================
> ====
> > --- test/CodeGen/aarch64-neon-copy.c
> > +++ test/CodeGen/aarch64-neon-copy.c
> > @@ -1274,18 +1274,16 @@
> >
> > // CHECK: test_vset_lane_f16
> > float16x4_t test_vset_lane_f16(float16x4_t v1) {
> > - float16_t a;
> > + float16_t a = 0;
> > return vset_lane_f16(a, v1, 3);
> > -// CHECK: fmov {{s[0-9]+}}, wzr
> > -// CHECK-NEXT: ins {{v[0-9]+}}.h[3], {{v[0-9]+}}.h[0]
> > +// CHECK: ins {{v[0-9]+}}.h[3], wzr
> > }
> >
> > // CHECK: test_vsetq_lane_f16
> > float16x8_t test_vsetq_lane_f16(float16x8_t v1) {
> > - float16_t a;
> > + float16_t a = 0;
> > return vsetq_lane_f16(a, v1, 7);
> > -// CHECK: fmov {{s[0-9]+}}, wzr
> > -// CHECK-NEXT: ins {{v[0-9]+}}.h[7], {{v[0-9]+}}.h[0]
> > +// CHECK: ins {{v[0-9]+}}.h[7], wzr
> > }
> >
> > // CHECK: test_vset_lane_f16_2
> > @@ -1365,3 +1363,11 @@
> > // CHECK-NEXT: ret
> > }
> >
> > +// CHECK-LABEL: test_vset_get_lane_f16:
> > +int test_vset_get_lane_f16(float16x8_t a) {
> > + float16x8_t b;
> > + b = vsetq_lane_f16(3.5, a, 5);
> > + float16_t c = vgetq_lane_f16(b, 5);
> > + return (int)c;
> > +// CHECK: movz x{{[0-9]+}}, #3
> > +}
> > Index: utils/TableGen/NeonEmitter.cpp
> >
> ===============================================================
> ====
> > --- utils/TableGen/NeonEmitter.cpp
> > +++ utils/TableGen/NeonEmitter.cpp
> > @@ -760,7 +760,7 @@
> > type = ModType(mod, type, quad, poly, usgn, scal, cnst, pntr);
> >
> > usgn = usgn | poly | ((ck == ClassI || ck == ClassW) &&
> > - scal && type != 'f' && type != 'd');
> > + scal && type != 'h' &&type != 'f' && type !=
> > + 'd');
> >
> > // All pointers are void* pointers. Change type to 'v' now.
> > if (pntr) {
> > @@ -768,11 +768,6 @@
> > poly = false;
> > type = 'v';
> > }
> > - // Treat half-float ('h') types as unsigned short ('s') types.
> > - if (type == 'h') {
> > - type = 's';
> > - usgn = true;
> > - }
> >
> > if (scal) {
> > SmallString<128> s;
> > @@ -796,6 +791,12 @@
> > return s.str();
> > }
> >
> > + // Treat half-float ('h') types as unsigned short ('s') types.
> > + if (type == 'h') {
> > + type = 's';
> > + usgn = true;
> > + }
> > +
> > // Since the return value must be one type, return a vector type of
the
> > // appropriate width which we will bitcast. An exception is made for
> > // returning structs of 2, 3, or 4 vectors which are returned in a
> > sret-like
> >
> >
> > _______________________________________________
> > cfe-commits mailing list
> > cfe-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
More information about the cfe-commits
mailing list