[llvm-bugs] [Bug 48493] New: Wrong optimization when processing vectors

Fri Dec 11 22:34:55 PST 2020

https://bugs.llvm.org/show_bug.cgi?id=48493

            Bug ID: 48493
           Summary: Wrong optimization when processing vectors
           Product: clang
           Version: 8.0
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: C++
          Assignee: unassignedclangbugs at nondot.org
          Reporter: 2077213809 at qq.com
                CC: blitzrakete at gmail.com, dgregor at apple.com,
                    erik.pilkington at gmail.com, llvm-bugs at lists.llvm.org,
                    richard-llvm at metafoo.co.uk

Created attachment 24270
  --> https://bugs.llvm.org/attachment.cgi?id=24270&action=edit
the testcase

I have the following test cases for the Aarch64 backend, but it is found that
the clang frontend is optimized strangely. As a result, the test fails.

I use the vceqq_u8 function to check whether the two vectors are the same, and
use __builtin_neon_vgetq_lane_i16 to set the bit of the unsigned short to 1 if
each position of the vector is the same, and then print the information, i use
a common constructor function to obtain v_zero and v_denom, which arranges
1-16. After compilation, Clang considers that the two values are always equal
and prints 0xffff.

the clang version is 8.0.1,and even O0 the result is the same, build based on
the aarch64 environment,target is aarch64-unknown-linux-gnu

==================process ===============
clang++ -O0/-O2 vector.cpp
./a.out
ffff
================== code ============
typedef unsigned char uchar;
struct v_uint8x16
{
    typedef uchar lane_type;
    enum { nlanes = 16 };

    v_uint8x16() {}
    explicit v_uint8x16(uint8x16_t v) : val(v) {}
    v_uint8x16(uchar v0, uchar v1, uchar v2, uchar v3, uchar v4, uchar v5,
uchar v6, uchar v7,
               uchar v8, uchar v9, uchar v10, uchar v11, uchar v12, uchar v13,
uchar v14, uchar v15)
    {
        uchar v[] = {v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12,
v13, v14, v15};
        val = __extension__ ({ uint8x16_t __ret; __ret = (uint8x16_t)
__builtin_neon_vld1q_v(v, 48); __ret; });
    }

    uint8x16_t val;
};

template<typename T1, typename T2, typename Tvec>
struct op_div_scale
{

 //  #pragma clang optimize off
    static inline void pre(const Tvec& denom, const Tvec& res)
    {
        const Tvec v_zero = Tvec();
        printf("%x\n", __builtin_neon_vgetq_lane_i16(vceqq_u8(denom.val,
v_zero.val), 0));
    }
   #pragma clang optimize on
};

void test()
{
  v_uint8x16 vdenom(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16);
  op_div_scale<int, int, v_uint8x16>::pre(vdenom, vdenom);

}

int main()
{
  test();
  return 0;
}

================== info ======================
when use O2, IR is Directly printed 655.

; Function Attrs: nounwind
define dso_local void @_Z4testv() local_unnamed_addr #0 {
  %1 = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8],
[4 x i8]* @.str, i64 0, i64 0), i32 65535)
  ret void
}

when use #pragma clang optimize off, the result is correct, for clang 8.0.1 -O2
 35    #pragma clang optimize off
 36     static inline void pre(const Tvec& denom, const Tvec& res)
 37     {
 38         const Tvec v_zero = Tvec();
 39         printf("%x\n", __builtin_neon_vgetq_lane_i16(vceqq_u8(denom.val,
v_zero.val), 0));
 40     }
 41    #pragma clang optimize on

I tried trunk branch O0. It's still wrong. although O2 is correct.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20201212/decf7c6f/attachment-0001.html>