[llvm-bugs] [Bug 44544] New: Nested loop unroll bug on skylake avx512

Tue Jan 14 08:14:35 PST 2020

https://bugs.llvm.org/show_bug.cgi?id=44544

            Bug ID: 44544
           Summary: Nested loop unroll bug on skylake avx512
           Product: clang
           Version: unspecified
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: C++
          Assignee: unassignedclangbugs at nondot.org
          Reporter: jakobschwarz at yahoo.com
                CC: blitzrakete at gmail.com, dgregor at apple.com,
                    erik.pilkington at gmail.com, llvm-bugs at lists.llvm.org,
                    richard-llvm at metafoo.co.uk

I think, I found a bug in clang, tested on local machines and on godbolt with
clang 7, 8 and 9. It only occurs with -O3 optimization and
-march=skylake-avx512. With GCC and Intel the code produces correct results.

Disabling loop nesting in the example is also fine with Clang. The code should
return just zeros in the cout print.

#include <iostream>

int main(int argc, char *argv[])
{
    static constexpr uint32_t mult = 4u;
    static constexpr uint64_t MASK_H = 0x000000000000FFFFull;
    uint64_t arr2[16][4];
    for(auto i=0; i<16; i++) for(auto j=0; j<4; j++) arr2[i][j] = ~uint64_t(0);

    uint64_t* mm =&arr2[0][0];
    for(uint32_t zz=0; zz<16; zz++){
// #pragma clang loop unroll(disable)
        for(uint32_t yy=0; yy<16; yy++){
            const uint32_t ID   = yy+zz*16;
            const uint64_t mask = ~(MASK_H<<(ID%mult*16));
            mm[ID/mult] &= mask;
        }
    }
    for(auto i=0; i<16; i++) {
        for(auto j=0; j<4; j++) std::cout << arr2[i][j] << " ";
        std::cout << std::endl;
    }
    return 0;
}

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20200114/bf291126/attachment.html>