[llvm-dev] Problem with clang optimizer?

Dimitry Andric via llvm-dev llvm-dev at lists.llvm.org
Mon Oct 25 14:43:02 PDT 2021


Hi Uri,

Unfortunately the fix for this didn't make into 13.0.0, and will hopefully be part of 13.0.1 (when that comes out I can't say though).

-Dimitry

> On 25 Oct 2021, at 23:35, Blumenthal, Uri - 0553 - MITLL via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> I just tried Clang-13 (with LLVM-13), and the problem is still there. Vectorizer still broken wrt. SSE-4.1 instruction extensions:
> 
> $ echo $CXXFLAGS
> -std=gnu++17 -O3 -march=native -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk
> $ clang++-mp-13 $CXXFLAGS -o t sha3-reproducer.cxx
> $ ./t
> Assertion failed: (T[0] == 16394434931424703552u), function main, file sha3-reproducer.cxx, line 103.
> Abort trap: 6
> $ clang++-mp-13 $CXXFLAGS -mno-sse4.1 -o t sha3-reproducer.cxx
> $ ./t
> $
> 
> 
> --
> Regards,
> Uri
> 
> There are two ways to design a system. One is to make is so simple there are obviously no deficiencies.
> The other is to make it so complex there are no obvious deficiencies.
>                                                                                                                                      -  C. A. R. Hoare
> 
> 
> From: Jameson Nash <vtjnash at gmail.com>
> Date: Wednesday, September 29, 2021 at 19:41
> To: Craig Topper <craig.topper at gmail.com>
> Cc: Uri Blumenthal <uri at ll.mit.edu>, LLVM-DEV LIST <llvm-dev at lists.llvm.org>
> Subject: Re: [llvm-dev] Problem with clang optimizer?
> 
> This may be fixed now (https://reviews.llvm.org/D106613 <https://reviews.llvm.org/D106613>), but it remains to be confirmed for https://bugs.llvm.org/show_bug.cgi?id=51957 <https://bugs.llvm.org/show_bug.cgi?id=51957>
> 
> On Sun, Sep 26, 2021 at 1:12 AM Craig Topper via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> Looking at the IR here https://godbolt.org/z/zaMW1renW <https://godbolt.org/z/zaMW1renW> I believe the issue is on this instruction on line 361
>> 
>> %30 = extractelement <2 x <2 x i64>*> %bc438, i32 0
>> 
>> It should be extracting from index 1 instead of index 0.
>> 
>> ~Craig
>> 
>> 
>> On Sat, Sep 25, 2021 at 5:48 PM Blumenthal, Uri - 0553 - MITLL <uri at ll.mit.edu <mailto:uri at ll.mit.edu>> wrote:
>>> I found that
>>> ·         The problem disappears with -mno-sse4.1
>>> ·         The problem manifests with both Apple Clang from Xcode-13, and LLVM Clang-12 (and not with Xcode-12 or LLVM Clang-11)
>>> ·         I could experiment only on Apple platform, as that’s the only one I have that runs LLVM Clang-12.
>>> 
>>> --
>>> Regards,
>>> Uri
>>> 
>>> There are two ways to design a system. One is to make is so simple there are obviously no deficiencies.
>>> The other is to make it so complex there are no obvious deficiencies.
>>>                                                                                                                                      -  C. A. R. Hoare
>>> 
>>> 
>>> From: Craig Topper <craig.topper at gmail.com <mailto:craig.topper at gmail.com>>
>>> Date: Saturday, September 25, 2021 at 12:07
>>> To: Dimitry Andric <dimitry at andric.com <mailto:dimitry at andric.com>>
>>> Cc: Uri Blumenthal <uri at ll.mit.edu <mailto:uri at ll.mit.edu>>, LLVM-DEV LIST <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
>>> Subject: Re: [llvm-dev] Problem with clang optimizer?
>>> 
>>> It reproduced for me with -march=nehalem which does not have AVX.
>>> 
>>> On Sat, Sep 25, 2021 at 2:51 AM Dimitry Andric via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>>> It is only occurring (as far as I can see now) on x86_64, with -mavx enabled. Or with a target CPU that supports AVX. And it is not Apple clang specific.
>>>> 
>>>> -Dimitry
>>>> 
>>>> 
>>>>> On 24 Sep 2021, at 15:30, Blumenthal, Uri - 0553 - MITLL via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>>>> 
>>>>> I tried to reproduce it on goldbolt with clang 12.0.0 and 12.0.1 but things seem fine when I run it there: https://godbolt.org/z/vrq8j6Kj7 <https://godbolt.org/z/vrq8j6Kj7>.
>>>>> Can you share your exact clang invocation? Does it only reproduce in some specific environment?
>>>>> 
>>>>> Save the source I posted before into “sha3-reproducer.cxx” file. Let me know if you want it re-posted here.
>>>>> 
>>>>> $ clang++-mp-12 -v
>>>>> clang version 12.0.1
>>>>> Target: x86_64-apple-darwin20.6.0
>>>>> Thread model: posix
>>>>> InstalledDir: /opt/local/libexec/llvm-12/bin
>>>>> $ clang++-mp-12 -o s -O3 sha3-reproducer.cxx
>>>>> $ ./s
>>>>> Assertion failed: (T[0] == 16394434931424703552u), function main, file sha3-reproducer.cxx, line 103.
>>>>> Abort trap: 6
>>>>> $ clang++-mp-12 -o s -O2 sha3-reproducer.cxx
>>>>> $ ./s
>>>>> Assertion failed: (T[0] == 16394434931424703552u), function main, file sha3-reproducer.cxx, line 103.
>>>>> Abort trap: 6
>>>>> $ clang++-mp-12 -o s -O1 sha3-reproducer.cxx
>>>>> $ ./s
>>>>> $
>>>>> 
>>>>> Clang-12 is installed via Macports, which is why we invoke the executable as clang++-mp-12.
>>>>> 
>>>>> The same problem manifests in exactly the same way in the Xcode-13 version of Clang (presumably based on LLVM Clang-12).
>>>>> 
>>>>> I’ll be happy to provide more of specific details, if you let me know what you need.
>>>>> 
>>>>> 
>>>>> Also, it generally helps to reduce code bug reports as much as possible; creduce can help with that: https://embed.cs.utah.edu/creduce/using/ <https://embed.cs.utah.edu/creduce/using/>.
>>>>> 
>>>>> Understood. Unfortunately, the above reproducer is the best we could come up with. An alternative is trying to build the Botan package itself https://github.com/randombit/botan.git <https://github.com/randombit/botan.git>.
>>>>> 
>>>>> 
>>>>> 
>>>>> On Thu, Sep 23, 2021 at 10:14 PM Blumenthal, Uri - 0553 - MITLL via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>>>>> I’m not sure if this is the correct list, so please direct me to the right one if this bug report shouldn’t go here.
>>>>>> 
>>>>>> The problem is: invoking clang (v12) with -O2 or better optimization flags generates wrong object code for the following C++. Compiling it with -O1 generates working binary.
>>>>>> 
>>>>>> 
>>>>>> =================
>>>>>> 
>>>>>> #include <cstdint>
>>>>>> #include <cassert>
>>>>>> 
>>>>>> template<size_t ROT, typename T>
>>>>>> inline constexpr T rotl(T input)
>>>>>>    {
>>>>>>    static_assert(ROT > 0 && ROT < 8*sizeof(T), "Invalid rotation constant");
>>>>>>    return static_cast<T>((input << ROT) | (input >> (8*sizeof(T) - ROT)));
>>>>>>    }
>>>>>> 
>>>>>> inline void SHA3_round(uint64_t T[25], const uint64_t A[25], uint64_t RC)
>>>>>>    {
>>>>>>    const uint64_t C0 = A[0] ^ A[5] ^ A[10] ^ A[15] ^ A[20];
>>>>>>    const uint64_t C1 = A[1] ^ A[6] ^ A[11] ^ A[16] ^ A[21];
>>>>>> 
>>>>>>    // the calculation of C2 fails for -O3 or -O2 with clang 12
>>>>>>    // FWIW: it would produce a value that doesn't fit into a _signed_ 64-bit int
>>>>>>    const uint64_t C2 = A[2] ^ A[7] ^ A[12] ^ A[17] ^ A[22];
>>>>>> 
>>>>>>    const uint64_t C3 = A[3] ^ A[8] ^ A[13] ^ A[18] ^ A[23];
>>>>>>    const uint64_t C4 = A[4] ^ A[9] ^ A[14] ^ A[19] ^ A[24];
>>>>>> 
>>>>>>    const uint64_t D0 = rotl<1>(C0) ^ C3;
>>>>>>    const uint64_t D1 = rotl<1>(C1) ^ C4;
>>>>>>    const uint64_t D2 = rotl<1>(C2) ^ C0;
>>>>>>    const uint64_t D3 = rotl<1>(C3) ^ C1;
>>>>>>    const uint64_t D4 = rotl<1>(C4) ^ C2;
>>>>>> 
>>>>>>    const uint64_t B00 =          A[ 0] ^ D1;
>>>>>>    const uint64_t B01 = rotl<44>(A[ 6] ^ D2);
>>>>>>    const uint64_t B02 = rotl<43>(A[12] ^ D3);
>>>>>>    const uint64_t B03 = rotl<21>(A[18] ^ D4);
>>>>>>    const uint64_t B04 = rotl<14>(A[24] ^ D0);
>>>>>>    T[ 0] = B00 ^ (~B01 & B02) ^ RC;
>>>>>>    T[ 1] = B01 ^ (~B02 & B03);
>>>>>>    T[ 2] = B02 ^ (~B03 & B04);
>>>>>>    T[ 3] = B03 ^ (~B04 & B00);
>>>>>>    T[ 4] = B04 ^ (~B00 & B01);
>>>>>> 
>>>>>>    const uint64_t B05 = rotl<28>(A[ 3] ^ D4);
>>>>>>    const uint64_t B06 = rotl<20>(A[ 9] ^ D0);
>>>>>>    const uint64_t B07 = rotl< 3>(A[10] ^ D1);
>>>>>>    const uint64_t B08 = rotl<45>(A[16] ^ D2);
>>>>>>    const uint64_t B09 = rotl<61>(A[22] ^ D3);
>>>>>>    T[ 5] = B05 ^ (~B06 & B07);
>>>>>>    T[ 6] = B06 ^ (~B07 & B08);
>>>>>>    T[ 7] = B07 ^ (~B08 & B09);
>>>>>>    T[ 8] = B08 ^ (~B09 & B05);
>>>>>>    T[ 9] = B09 ^ (~B05 & B06);
>>>>>> 
>>>>>>    // --- instructions starting from here can be removed
>>>>>>    //     and the -O3 dicrepancy is still triggered
>>>>>> 
>>>>>>    const uint64_t B10 = rotl< 1>(A[ 1] ^ D2);
>>>>>>    const uint64_t B11 = rotl< 6>(A[ 7] ^ D3);
>>>>>>    const uint64_t B12 = rotl<25>(A[13] ^ D4);
>>>>>>    const uint64_t B13 = rotl< 8>(A[19] ^ D0);
>>>>>>    const uint64_t B14 = rotl<18>(A[20] ^ D1);
>>>>>>    T[10] = B10 ^ (~B11 & B12);
>>>>>>    T[11] = B11 ^ (~B12 & B13);
>>>>>>    T[12] = B12 ^ (~B13 & B14);
>>>>>>    T[13] = B13 ^ (~B14 & B10);
>>>>>>    T[14] = B14 ^ (~B10 & B11);
>>>>>> 
>>>>>>    const uint64_t B15 = rotl<27>(A[ 4] ^ D0);
>>>>>>    const uint64_t B16 = rotl<36>(A[ 5] ^ D1);
>>>>>>    const uint64_t B17 = rotl<10>(A[11] ^ D2);
>>>>>>    const uint64_t B18 = rotl<15>(A[17] ^ D3);
>>>>>>    const uint64_t B19 = rotl<56>(A[23] ^ D4);
>>>>>>    T[15] = B15 ^ (~B16 & B17);
>>>>>>    T[16] = B16 ^ (~B17 & B18);
>>>>>>    T[17] = B17 ^ (~B18 & B19);
>>>>>>    T[18] = B18 ^ (~B19 & B15);
>>>>>>    T[19] = B19 ^ (~B15 & B16);
>>>>>> 
>>>>>>    const uint64_t B20 = rotl<62>(A[ 2] ^ D3);
>>>>>>    const uint64_t B21 = rotl<55>(A[ 8] ^ D4);
>>>>>>    const uint64_t B22 = rotl<39>(A[14] ^ D0);
>>>>>>    const uint64_t B23 = rotl<41>(A[15] ^ D1);
>>>>>>    const uint64_t B24 = rotl< 2>(A[21] ^ D2);
>>>>>>    T[20] = B20 ^ (~B21 & B22);
>>>>>>    T[21] = B21 ^ (~B22 & B23);
>>>>>>    T[22] = B22 ^ (~B23 & B24);
>>>>>>    T[23] = B23 ^ (~B24 & B20);
>>>>>>    T[24] = B24 ^ (~B20 & B21);
>>>>>>    }
>>>>>> 
>>>>>> int main()
>>>>>> {
>>>>>>     uint64_t T[25];
>>>>>> 
>>>>>>     uint64_t A[25] = {
>>>>>>         15515230172486u, 9751542238472685244u, 220181482233372672u,
>>>>>>         2303197730119u, 9537012007446913720u, 0u, 14782389640143539577u,
>>>>>>         2305843009213693952u, 1056340403235818873u, 16396894922196123648u,
>>>>>>         13438274300558u, 3440198220943040u, 0u, 3435902021559310u, 64u,
>>>>>>         14313837075027532897u, 32768u, 6880396441885696u, 14320469711924527201u,
>>>>>>         0u, 9814829303127743595u, 18014398509481984u, 14444556046857390455u,
>>>>>>         4611686018427387904u, 18041275058083100u };
>>>>>> 
>>>>>>     SHA3_round(T, A, 0x0000000000008082);
>>>>>> 
>>>>>>     assert(T[0]  == 16394434931424703552u);
>>>>>>     assert(T[1]  == 10202638136074191489u);
>>>>>>     assert(T[2]  == 6432602484395933614u);
>>>>>>     assert(T[3]  == 10616058301262943899u);
>>>>>>     assert(T[4]  == 14391824303596635982u);
>>>>>>     assert(T[5]  == 5673590995284149638u);
>>>>>>     assert(T[6]  == 15681872423764765508u);
>>>>>>     assert(T[7]  == 11470206704342013341u);
>>>>>>     assert(T[8]  == 8508807405493883168u);
>>>>>>     assert(T[9]  == 9461805213344568570u);
>>>>>>     assert(T[10] == 8792313850970105187u);
>>>>>>     assert(T[11] == 13508586629627657374u);
>>>>>>     assert(T[12] == 5157283382205130943u);
>>>>>>     assert(T[13] == 375019647457809685u);
>>>>>>     assert(T[14] == 9294608398083155963u);
>>>>>>     assert(T[15] == 16923121173371064314u);
>>>>>>     assert(T[16] == 4737739424553008030u);
>>>>>>     assert(T[17] == 5823987023293412593u);
>>>>>>     assert(T[18] == 13908063749137376267u);
>>>>>>     assert(T[19] == 13781177305593198238u);
>>>>>>     assert(T[20] == 9673833001659673401u);
>>>>>>     assert(T[21] == 17282395057630454440u);
>>>>>>     assert(T[22] == 12906624984756985556u);
>>>>>>     assert(T[23] == 3081478361927354234u);
>>>>>>     assert(T[24] == 93297594635310132u);
>>>>>> 
>>>>>>     return 0;
>>>>>> }
>>>>>> =================
>>>>>> 
>>>>>> Your help debugging and fixing this problem is appreciated!
>>>>>> --
>>>>>> Regards,
>>>>>> Uri Blumenthal                              Voice: (781) 981-1638
>>>>>> Secure Resilient Systems and Technologies   Cell:  (339) 223-5363
>>>>>> MIT Lincoln Laboratory
>>>>>> 244 Wood Street, Lexington, MA <https://www.google.com/maps/search/Wood+Street,+Lexington,+MA+02420-9108?entry=gmail&source=g>  02420-9108 <https://www.google.com/maps/search/Wood+Street,+Lexington,+MA+02420-9108?entry=gmail&source=g>
>>>>>> 
>>>>>> Web:     https://www.ll.mit.edu/biographies/uri-blumenthal <https://www.ll.mit.edu/biographies/uri-blumenthal>
>>>>>> Root CA: https://www.ll.mit.edu/llrca2.pem <https://www.ll.mit.edu/llrca2.pem>
>>>>>> 
>>>>>> There are two ways to design a system. One is to make is so simple there are obviously no deficiencies.
>>>>>> The other is to make it so complex there are no obvious deficiencies.
>>>>>>                                                                                                                                      -  C. A. R. Hoare
>>>>>> 
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>>>> 
>>>>> 
>>>>> --
>>>>> Jakub Kuderski
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>>> 
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>> --
>>> ~Craig
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>_______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211025/43258e2e/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 223 bytes
Desc: Message signed with OpenPGP
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211025/43258e2e/attachment-0001.sig>


More information about the llvm-dev mailing list