<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/55281>55281</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            SVE VLS miscompilation of llvm XRay BlockIndexer::flush with opaque pointers
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            backend:AArch64,
            miscompilation
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          DavidSpickett
      </td>
    </tr>
</table>

<pre>
    Since the switch to opaque pointers we (Linaro) have had several XRay unitests failing on our SVE VLS (aka fixed size vector) bot. These were masked by a bunch of other build breakages at the time hence the late report.

E.g. https://lab.llvm.org/buildbot/#/builders/176/builds/1631 (don't look at the latest builds, we have yet another build breakage, so they have different results)

This does not happen on any other bot including Arm 32 bit and non SVE AArch64 bots. Setting opaque pointers to off fixes the issue.

This is what the symbolised stack dump looks like:
```
 #0 0x0000aaaabba1ea34 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (./prog+0x3ea34)
 #1 0x0000aaaabba1cb60 llvm::sys::RunSignalHandlers() (./prog+0x3cb60)
 #2 0x0000aaaabba1f168 SignalHandler(int) Signals.cpp:0:0
 #3 0x0000ffff89ca95c0 (linux-vdso.so.1+0x5c0)
 #4 0x0000aaaabb9ffdf8 void testing::internal::PrintTo<llvm::xray::Record*>(llvm::xray::Record* const&, std::ostream*) (./prog+0x1fdf8)
 #5 0x0000aaaabb9ffdd4 testing::internal::UniversalPrinter<llvm::xray::Record*>::Print(llvm::xray::Record* const&, std::ostream*) (./prog+0x1fdd4)
 #6 0x0000aaaabb9ffd38 void testing::internal::UniversalPrint<llvm::xray::Record*>(llvm::xray::Record* const&, std::ostream*) (./prog+0x1fd38)
 #7 0x0000aaaabb9ffc1c void testing::internal::DefaultPrintTo<std::vector<llvm::xray::Record*, std::allocator<llvm::xray::Record*> > >(testing::internal::WrapPrinterType<(testing::internal::DefaultPrinterType)0>, std::vector<llvm::xray::Record*, std::allocator<llvm::xray::Record*> > const&, std::ostream*) (./prog+0x1fc1c)
 #8 0x0000aaaabb9ffb50 void testing::internal::PrintTo<std::vector<llvm::xray::Record*, std::allocator<llvm::xray::Record*> > >(std::vector<llvm::xray::Record*, std::allocator<llvm::xray::Record*> > const&, std::ostream*) (./prog+0x1fb50
```

So at first I thought it was gtest doing something wrong but no, replacing the gtest libs from the no opaque pointers build didn't fix things. Replacing `libllvmXRay.a` did. Meaning that something is corrupting whatever structure gtest is trying to print.

I narrowed this down by replacing the object files in `libLLVMXray.a` one by one until the tests passed. This pointed to `BlockIndexer.cpp.o` (though I only used one of the test cases, so there could be more but more likely they all use this one file).

Dumping both those object files the only significant difference is that after the opaque pointer change it emits SVE code in `BlockIndexer::flush`, instead of Neon (well, using Neon registers).

`BlockIndexer.h`:
```
<...>
struct Block {
    uint64_t ProcessID;
    int32_t ThreadID;
    WallclockRecord *WallclockTime;
    std::vector<Record *> Records;
  };
<...>
```
On AArch64 the size of this struct is 48 bytes 24 of which is the vector (https://godbolt.org/z/s1bxbGzvv). The 32 bit field is padded to be 64 bit.

`BlockIndexer.cpp`:
```
Error BlockIndexer::flush() {
  Index::iterator It;
  std::tie(It, std::ignore) =
      Indices.insert({{CurrentBlock.ProcessID, CurrentBlock.ThreadID}, {}});
  It->second.push_back({CurrentBlock.ProcessID, CurrentBlock.ThreadID,
                        CurrentBlock.WallclockTime,
                        std::move(CurrentBlock.Records)});
  CurrentBlock.ProcessID = 0;
  CurrentBlock.ThreadID = 0;
  CurrentBlock.Records = {};
  CurrentBlock.WallclockTime = nullptr;
  return Error::success();
}
```

This is the previous assembly:
```
 138:   aa1403f6        mov     x22, x20
 13c:   3cc28ec0        ldr     q0, [x22, #40]!
 140:   3c8183e0        stur    q0, [sp, #24]
 144:   f85e82c8        ldur    x8, [x22, #-24]
 148:   a9007edf        stp     xzr, xzr, [x22]
 14c:   f90003e8        str     x8, [sp]
 150:   b85f02c8        ldur    w8, [x22, #-16]
 154:   b9000be8        str     w8, [sp, #8]
 158:   f85f82c8        ldur    x8, [x22, #-8]
 15c:   f9000be8        str     x8, [sp, #16]
 160:   f9400ac8        ldr     x8, [x22, #16]
 164:   f9000adf        str     xzr, [x22, #16]
 168:   f90017e8        str     x8, [sp, #40]
```
Pretty straightforward. Copy from x22 (which appears to be an address offset a bit into the Block struct to begin with) to the stack (for the `std::move`) and then zero the location you loaded from.

Here's the post opaque pointers SVE code:
```
 138:   25d8e080        ptrue   p0.d, vl4
 13c:   aa1403f6        mov     x22, x20
 140:   92800009        mov     x9, #0xffffffffffffffff         // #-1
 144:   f8028edf        str     xzr, [x22, #40]!
 148:   f85e82c8        ldur    x8, [x22, #-24]
 14c:   a5e942c0        ld1d    {z0.d}, p0/z, [x22, x9, lsl #3]
 150:   910003e9        mov     x9, sp
 154:   a900fedf        stp     xzr, xzr, [x22, #8]
 158:   f90003e8        str     x8, [sp]
 15c:   b85f02c8        ldur    w8, [x22, #-16]
 160:   b9000be8        str     w8, [sp, #8]
 164:   d2800048        mov     x8, #0x2                        // #2
 168:   e5e84120        st1d    {z0.d}, p0, [x9, x8, lsl #3]
```

I haven't walked through it fully (and to be honest, I'd probably misunderstand the SVE parts) but I believe that at least this part is not correct:
```
 144:   f8028edf        str     xzr, [x22, #40]!
 148:   f85e82c8        ldur    x8, [x22, #-24]
 14c:   a5e942c0        ld1d    {z0.d}, p0/z, [x22, x9, lsl #3]
```
We store zero to some part of the `Block` then proceed to load from it and later store it to the stack. Later gtest comes along, tries to dereference a null pointer and we get a segfault.

I'm also not sure about the use of -1 as the offset to the `ld1d`, as the docs refer to it as a `UInt` (https://developer.arm.com/documentation/ddi0596/2020-12/SVE-Instructions/LD1D--scalar-plus-scalar---Contiguous-load-doublewords-to-vector--scalar-index--). Again I'm no SVE expert so that could be totally fine.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzdWVlz47gR_jXyC0osHjqoBz_4mM26yptMrb0z-zYFkqCEmCK4AKjDvz5fA6QkytfMppJNRbZlkkA3-kL312Cmiv3lg6xzwexKMLOVNl8xq5hq-B-tYI2StRXasK1gozi9lzXXahQv2IpvBL4KZsRGaF6x33_le9bW0gpjDSu5rGS9ZKpmqtXs4csn9uX-gVjwJ85KuROglM-CbURulSaOmbIBe1wJI7CYFmzNzRNmZXvGWdbWEEuVTEFKjVtZYUQLMFsKw7h10lu5hkyiV6biVjAtGqVtMApvR-GV__4ULAO2srYxo-RqFP-E34pnQVVt1oHSS9w6_pDHDSb9A5gBl9F81j9wt7MkIrUKVY_iuWWVUk-9PCSAsayfe0NGdHbbC8t4_ZouNMsoot77qYUsS1ijttDEtJUFn8WpMo8raVihYATwA0nTiJqszut9byw8h4OrtiCHXOk1S2KWSZKgAFHtnHN1pfPVbEKTTcAehLXOe2dBQHFRls59xmkojWlF8EIe_G5XnRHMfp2pShpyuOX5EyvadePMZFglnwT5wNPPwu7X3cKoScjCXYgPxyfLeCR4MmHkJyJKrsze-IvPGgI-EPdHzXMYMT1O0nz7TRkL-65H8Yzsi7kUb5gVwIGNVnD5dbhLiPvBurR8dLZ8ns3C15b_ta0f5LLm1c8waeXCJH11BWIwWCE-W6GMZikb8AKPTl7_2AR502DN0P0dGCUdoxKfdJHzxTQPaX1swnY33hRGBfiNnBgYGggxGQixKMuiTNlGyYJR-CIOvJYuBiDBickf1Si5ORpkp_m-s4jIlS5GMa4_DbzxyhSWq9rYzjfGFn744LKr1ywZkYwDJaYvlCgm78n_Wy2RtwyvnCKw83coclT8P6RTMYy_2Qudko8dM1Tsr_NPMvTP_FyXPMo_1OVWlBw57xhrh-W7svGRdqcS86pSOf8equQT6_6g1zvifdW86cLncd8gk928P_9UnY4kXoRumRM5_xuq_SmXwmMDl6bnLs2m4Q_kjb_Al__7NoYJX62G_vtBEbQopQaquEN1Ve1yheJu2ZYbtnRgo1BUuI1aC7uiq61W-M5ai1JPkgAPVTynESrOnqaSGRCbVmv3rH4J_jxGKWThMQ7qP3PcARV-PfCDrGBEpiEoGHDcE0nAfhG89gtC-KNkAAmwmG4bBzUILxCShKl0m9tW98JhmtV7R69YQ9EzwBt3DIhUqy3ghfVQaFsTZhzqqbJ_wuMQvAJwkXUn6_39l19-172sqhZESP_a2srKQ0qHZhtugF8InWIFb5WCxAHZNaLi6a4uxE5oKs2BIl6UB5x34CZVVwDGhH-INTBsz5fl3AhzRHxQOVctgUGAX4U78pq7IJwEJg4VIg6Jm9eWOJJSCKaBVW6BsUh3wLkVxYk5M4GzCcllACpkKXMOgNlDzVw4o5O3eAn_-9mDkGD5itdLQaEn1hIWIgyZq0J0xj21ig_8smrNisLZATBjBVoH2OLvAgAU1tqKqqKh1pDY7qkWS2msQ1ND5c6t7ti-gSKxWYMgoN3vbn1wMUfORvPrLpnh00Kv2eSbZZ-1yoUxd7ej5GQYo0mM0ccVtm9xNvgVPsmJpc8B0Ofq8OgRPclg8sscdKSitOHvzAnNaH5c7kyfM23_UR-AvIPe1GC5gIM_O9VxNUkR5whAFk9odLuS6K2kDwovFHlk2CAtFfqhynb90TP-TJTtsr89bzbkHurb-railAIxTBuFF4XfJwho6i2kfdePBGzf9uQn7HLN3gqsDnAfPermdAUIQUSJm93ZE6se_GAlNQwYO83T2BbYeI5ncnt0nuMrER8BYlhowoG05vz6ptXUpDnxgmMIgeVg5BA-czfmaP3N4kS0OzuGhw0CoS6CBup9y9Da-LV-dKH45lT6l58B1TBqPyI9GGutNmTCAas-jKHYuXqvq0CGZuFb03p93p_VLeomdbZ9Y-ZAUze_bquqsfqEQAuUoZq5wPOKmjYnYX20Hffk_Padmt03xLS7Gi02UrWGUT1ZZ9X-rWBnEeAzjS04jyZhUs4YrmFnfO_imNy9i49zcz83yfM4FdT4hYuq0Pj-w-Xb0fS6I6JuLxxN4ZCoJ56EPXEapYlwxAaKn1KbpiOOJ0TcU048ZZlORRrnqV_WUe7SF-uOh7S9eoswnIui9Ks2RPqsnXr-X8fihLDTtQRhmIjUEw7WhLTH-dNOvSydluFQyO0rQkazU9pOwYzWyk7X2qbnlklP6dKDYcrvMMyA9FS97C31POFA1i5xgnACRN6v-caSQ8rJyZK8d4U-uuId0vRIGs3fl9bH3WvB_hkbze6pQHEJPFsqveUaaOtGNXsPS7G8AwmuUtERF_eHUagrvGYoMxq7ko6mDB2suSqEgu1AVVfru-rnaJbAKFtpV5Tdu0n-ZApLYHH3AMIN05s7uHFHZhiu2bPQntI1BRJwZa9a3HCqeCTzoNL9LKiWzLskgB7gBcLu8dPHCSGeFqkIU7dPka9aQf_DoCAzb6rJeU74nvzRp4BFnFI_tziZu-i8F-7Ksw9VAMzw8MDvnZeJIUQ--iiiXiSk9N9IK73WU7GYxH0mjAoSdH79TGbylbcJPYw55eaVrUzlDtReySGLyOWcF_ZBjJ9lDEpr5cdp7Z3U8b3pLf_z6a1PGT-a3vqEUbhwmaQn5kgP4RK_AhgG4RKfJxEBd0-iuCtAb_jMK-Ss7hd74a5Xi_CdO0z33euWV0-uW9SuQSPEitK_d-8m6h6urtBZGQcJ70BUoHKrjKNgs7U0bU2vAmyXC9zWbbh2J_OuZbsDg0qime16KDTYghvrYTjNJDhAp_XU_gJvv73n_7820pl6XyntUnfrc6lyRwPePl2T3DcI1FG7rNsQWPQtBaVaXxy6Nxn0skV3HKUdJPaA3btBf6CQYxkgsErVSxLTailcMYFTRd__cgcHD90u8d8KtnTVxYilO8YbHkMgStZgik6ePGvo_IJnqvXvQKhfh07jCMDPd9O-VHVC0mkE2dZ3x92UQuWGOYloGikJoWnqb3e17c4Yhj1agZCrVINOiut1ADXpmcrbNVCvq1F0X8hwuqA3WHEYh-MI_voJATy-q32FxCx6q3V_G92OxybnFdfjBh1Wfz0e36jaymULFDsmH4wL1WaV2BLuHls19g3kgVZSGzYeux7xaslReb2lauX2jdhBXOvPP7g9Hn9YZTntyVLWIrgoLpNikSz4hZW2Epf9y0RsRSjZyMoXYBiYDp_8i8g3-kRX98-L70Wrq8uzdhfT2qyzoTvt8__GiEA6RsGte_VFtppO4zS6WF3OJlmYx8UkKecJMG2azFCpk6kocVVMM35RceQFc4l9Mopj6uhETQij69hH_U6Mh3qNHPy9kJdwWBxO8ZNO02kU5NG05AuwmU7LyTSLRpNQrLmsDu8xL_SlEzlrlwaDlTTWHAfRgaDHFcKJA_68tSulL2_5RhYPjYRw1l44HS-dgv8Cj1ToQA">