<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=http://email.email.llvm.org/c/eJzFWtty28gR_RrqZYosALw_6EGSFa-qvF5HdGInL6wBMCCxAjDMYCCS_vqc7gFAgJJ3HYtJWJREEDN979PdA4U6Pl5_UWIQLD6khTR6ECzFVj4rUSpVCLtNS2FVaUUi00zoKJJlqguZZUehC6ErI0JtsWQrrZBGCVMVRVps6KYU5RZfxSKX0TYt1GjgvRt4N-73gxWgHFXGqMKCWJyWMsywONFG3NyYaDubkFBba3flYHwzCP6C9ya12yocRTrHRZY9N3-GO6N_V5HFJe7lKX1QyST2fX8ZRd54Fs5lssBff7EYz5VaLOQC3_gL6QeksCziyzCdBdF4nshARv4yVpFM5v5kOYkXyvPU2PfVNFbTYLmYEtO4UsJqNnHfNINgnmWitNJYsQdzsSfrPtDK4gm_lQBnGCuHCQfBnYiVhXNgO6Nkqdn8mbTKjMQHLWO6pD2RNirBMhDaaEvaOZ4zr37z5SejN0bm8LnJEQ8WVFmEMt3A62L18H51__7vxHWlNjl8Jy3iAdFRZbbRIhh7QngHr31N_OV8MhFpIdbrg5FHYj6-ua2SRJm_VqpS7gujMiig3Pdwwx8ublbNyJRYjD81--ntnQsrqA2LxBRoPi357C69Q4LX0p8mCWTjyP_ySUyDxWw2Jip4T9-9ahz3-3PPAS5FIh0r0o_zIKmKiKzyPRsLeCx6VMmdrgoEzQKqjO4PFhKXtzJ6gsNWFs4iIz_V36_Sb3xNS-tty8H4tqEnrc7TaF26bYvb4WB8TwZqyNLWek2GkOjzpJu5yrU5rrWJlVnL6F9VSoRgi7uGRe_VW167rSvR63bTIj-KqsAexHbBgdkxBGxYZTG-AbjoCJHHUVvLKEJnGMEqilDR56pEdIZHNor4TG7IlSxqMNpvAV9HXQlrjpRm4KoQMKqIlOgoL1IrdjqlT1hUFbnc7QiyWMFeWn7ZHp2vY63KAjlqgZNYDKBDtpKsNs3ZSQ8ikm6BOuwymRY9Op-Uwb6yzkmETbllxSk2vx8xZx7muH_Nz8uXrq7XVuer_hc-X-naG4qrQ7mHxRpAepYZEFAnPT-zLyiv70ShLd_iaEA6scmObgWcNRK_6L16Vgb2jrWz9ym4eGfIGAHmcJAqdLXZkpNLeXSISkuIG8htERniDIfvlAGuFtnx5CpZUsZnWj_hd_qkOMBIMcSxUQSHjXIcbBRrEcc2VbVS54ogfMO79pK1EDKKVFlSvaRthTpYUEaxFF-2abSlKpyDaQH0Nqg10DCj5bYWmIk6cQFJO0h8FFQ_cpGnZUm8eOGeaXWyA-LVku1TGKdgO4ZKfFMGIlkX6GzpnkmGw2H38lfNqlP1gYZI233fgo8KQsVVxEaB0dASQJKMHGNdGcI3VJcEF6bE6JzyEcIUtqIeg9IaUWOVK8d1ebkRK66NtLcgF0H2phmp0ixGR9L0HKctt3SHiupGoRog6qiEc-FHyyIQtqibv4PiVkVPQ647bSRj93sjQ5fj1Avt6iJJuyO9O5LQ5Ig9G5sThZqZ0_YvJgWeoSGKTLqzLnSImjMazAPtqBAh5JVzkD_zXHFBTFR5SBombKo2VBoNkSmldnCXQQ5ow6AGXIwVl2WKoZIJ7rV5Qn6cSuVZygaofz66mDAt6Lcst11vlsqKoep-Q1HNALMQgAH34woo3451CySdldFg_M6jpfhwh88QzF0Ft3j3CGBjhwZekPDr480_1r99-vzw28cVdg-CAKEb6jK1RyKGazGC8Els1mmR2jU5bKQFgWCH0J8tAaCo9mIv0ds1F2niYmCYYEMQMRwHpAB8UvSAU0VbIhqsEBluT2RgUgVM9tE9qVPkj4hEb-sBHvRP1kvbjw1Rtt1Pv-8v9m4EPxnsRVDdiA8puuMaReuE5XI7qSMaYcGAk2Nd5IIeoUoGQ41B0igMJqnmIJIFwDBOn9MYAEGzxymYweixcrlR5xlvILwG_kjOCiufFLNmVxBjB7CckYP5vR8IVGNIFytU9rieZIikJsh1DOclUNtCBgJb15WfpKhbRKQwwVuueghHgPEAsaq6QLX5j8WuchhXpXCPJqKyVHlIMPggKFJTnrvsltZwTd0T3lA5IghhzjSGwURlidLKjTuRcv1AybArMchF1pVSqlxNlXVbCMBGZBZXROgmyKGNQWVxlgCDsgrd3HOmc6ckQVxUsZNiNA6S7kz1pDXboO6WSzcMqdo1L4vPi8Ci4Uw86zR-2U_3Z4U7RJ3R2S06iScod3PHoImWem1F01nXl01zjWC47fFB3oxGIwr57reMB2BZd1yJstF2Dfu4tuuO-rOTYHdI6Vf6LmqmmCPQbPyO1rTwl8XyYMDksKCNmGwOgYfhRFz21XCD3KVoudU8Mcwd_MuxsBkrtF92WFxKrZ5nHlxf6eaFPbKfYrmsqNtKKrQ8T0rtaDagMEypr7JoYlI6leiR6R8FxGhNMmqzRtLk9YlArKOqHYTpukq9hUdldIufm9lk-E5aOfxs0HwhLIcPRWlN5UIeC1afP3x9bMwThcU30ZrHO2BwnimPYm_9z48zNw_7fie-_XFvaL5__Ljy1rP6AuV0PA3OQ7Y2zGkacn1gShba46dwTSCVs05fjbmoDtweMaCnyGSoMlpq17yuUSYcFYqUYTXm48kb1JiMZ1CD6j9zDUey6MvxG6HzPi1Vp7OtNSJt3ClJO1h-b-Dumck1lFyBmqOXPaFyXFuDezC2CPdffMtZi4qM66BDOD3ajtohLjY7ssiBGi327zimI4iLxD2hhREd8pxX4zqJx8vF5I0p9oKN1-Hy46TPg3FHHTmhr3MYFf2Im6Q7sUHHmTJw6ta_G011gjatGbLd6tF5BjnhXOh50VsyaOlCr2uBMHPXIO9Fsk1Q31-vS0l1GjHn---V_QQxqcDcP7dZSBj4Jvv9uW3PjdMZndjoq3YQpXMR7foEQ00Jn6q40GaP6ChCM0R171QinOyuRHj_hRJh0fcxC8-ViNfX1ap8ZFDnCbeZWul4hjsdSkndy0hVJyQR5mFOGg4xrFdHXcQ90nzOymcNbqwDOR6nVPxGBfu4OHtLcE6WwXdwEWlauiitXRX4ta9mPyn-K5oUauNYuGr-ZiA7Iy9jZ-puP3IRJj0__6okN6oHz80hLao354F1b9aeG4dzCdBuhMz1sxPShWtwISP8kQ044SbnfnxdKb9RivP5XKWJt5w1C1ocpps_LW5Ri-t3XPZD5Hri_62usO6hkvxB5l1YDqX3OixP-XyxkNk6Z1Pcf3rOW4D_QwadDucMUB-K9gmLiKt8R1gBsaOtO32TCTjyUIUptKxytnCEboThllWth0w6-RMy1M_qRbFtW2V_edFiwRGc0vWzNwriNr4uEMfnKbLnFNl_M28n3SXfmmjZ1tPgZ_uRX2qkf-WhgWzPprsbDosX2LDoD6j9W2J4hiP1dtwp3bNAKkdpcf4ookfyIg_p3FT7_3pW96M-bcN-cam5t6c_PzWK87A5azWqpBNTncDsZtMcUdCDlPoQxc1Fbqqsv6FMLvtmpeeurhGRTN0hAIGj28m3dEEPshJct-0VraVXWm4vomtH4U7f2dr0lQedV_H1OF6Ol_JKVnarzfU7-ZzGq10aPSlrryqTXf_HT8f5aIlm3el4MZ1dba-X8yAKlPSnE288j4NYxrP5chmEiZxF3nTsX_FIWV5DxkEQcGBxSl-9nXl6HXgB3v7UnwZzbzryvdlYqkUwmeC2mvmDiadymWYjojPSZnNlrplkWG1K3MzS0panmwB0JJ1SLCsktKnN1PXXR3kUSWyGRtG53Sja7U7_RVG--DeKG0OCN_-AQP9QccViX7PM_wYu6owD>53856</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
XRay fdr-reinit.cpp test fails occasionally on Arm/AArch64 bots
</td>
</tr>
<tr>
<th>Labels</th>
<td>
xray
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
DavidSpickett
</td>
</tr>
</table>
<pre>
We (Linaro) have seen this test fail occasionally on our bots that are running on a shared machine.
It is currently disabled for AArch64 (https://github.com/llvm/llvm-project/commit/ef4d1119cc036b7af803618837ee88a87af18a12) and AArch64 (https://github.com/llvm/llvm-project/commit/62c37fa2ac19decaf71494d8e00e311e5de52985) due to this.
I'll start with what I think the problem is, detailed reasoning later. Loading the corefile I got:
```
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000000000419744 in __xray::BufferQueue::releaseBuffer(__xray::BufferQueue::Buffer&) ()
[Current thread is 1 (Thread 0xffff915ff000 (LWP 528663))]
```
The problem is this code in that function:
```
decRefCount(Buf.ExtentsBackingStore, kExtentsSize, Buf.Count);
atomic_store(B->Buff.Extents, atomic_load(Buf.Extents, memory_order_acquire),
memory_order_release);
```
To my understanding decRefCount could deallocate the Extents backing store being used by Buf. This means that when you try to dereference Buf.Extents it points to unmapped memory.
Why this doesn't happen all the time, I can't explain.
Perhaps the code should read:
```
atomic_store(&(B->Buff.Extents), atomic_load(&(Buf.Extents), memory_order_acquire),
memory_order_release);
```
So that we are swapping the value of the Extents pointer, not the locations they point to. However I don't understand the buffer well enough to say what the intent here is.
Certainly the code as is looks like you are decrementing the refernce count for something you want to access on the next line. Which seems incorrect unless there is some
property I'm missing here which means that ref count will never be zero at this point.
---
More detail follows.
Reproducing took a while but I got a core file from it eventually. The steps:
* Start a container on our buildbot machine
* Build stage 1 of llvm and run `ninja check-xray`
* Grab the test program and copy it somewhere memorable
* Write a script like the following: (note that 160 is the number of cores on the machine, so when lit runs it by default sees 160 workers)
```
#!/bin/bash
set -e
for (( ; ; ))
do
for (( c=0; c<=160; c++ ))
do
#XRAY_OPTIONS="verbosity=1" ./fdr_init_test.o &
./fdr_init_test.o &
done
wait
if test -f "core"; then
echo "Some test crashed! See core file."
exit 1
fi
echo "<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>"
done
```
* Limit the containers to 4 cores (this mimics the worst case scenario for an individual bot)
* Run the script for as long as it takes to crash (seems like ~12 hours depending on the other bot's activity I think)
That got me a core file and I found the following going through the disassembly. I initially thought we were looking
at an issue with the atomics but afaict they are not the issue here. (I'm not an expert on that subject)
There is some inlining going on and I'm following the functions to the crash point.
```
// void decRefCount(BufferQueue::ControlBlock *C, size_t Size, size_t Count) {
// <...>
// if (atomic_fetch_sub(&C->RefCount, 1, memory_order_acq_rel) == 1)
ldaxr x8, [x20]
subs x8, x8, #0x1
stlxr w9, x8, [x20]
// If the store was not succesfull keep trying it until it is
// https://developer.arm.com/documentation/dui0801/h/A64-Data-Transfer-Instructions/STLXR
cbnz w9, 0x4196e0 <_ZN6__xray11BufferQueue13releaseBufferERNS0_6BufferE+352>
// If the reference count is now non zero then don't unmap memory
// see label dont_unmap
b.ne 0x419734 <_ZN6__xray11BufferQueue13releaseBufferERNS0_6BufferE+436> // b.any
// Otherwise ref count is now zero, deallocate Buf.ExtentsBackingStore
// In our case I think we did unmap the memory so we didn't take this branch.
adrp x23, 0x43d000
ldr x23, [x23, #3984]
ldr x0, [x23]
// If page size is not cached, get it, otherwise go to page_size_cached.
cbnz x0, 0x41970c <_ZN6__xray11BufferQueue13releaseBufferERNS0_6BufferE+396>
bl 0x40cae0 <_ZN11__sanitizer11GetPageSizeEv>
str x0, [x23]
page_size_cached:
// Something to do with rounding up the size occurs...
sub x8, x0, #0x1
tst x0, x8
// Not sure what this does but we don't take the branch, the target is beyond
// the point where we faulted
b.ne 0x41976c <_ZN6__xray11BufferQueue13releaseBufferERNS0_6BufferE+492> // b.any
lsl x8, x21, #6
neg x9, x0
add x8, x8, x0
// Meaning x0 (the memory to unmap) = 0xffff91b7a000
mov x0, x20
add x8, x8, #0x46
// Meaning x1 (the size to unmap) = 4096 (the page size)
and x1, x8, x9
// Unmap that area
bl 0x40ba00 <_ZN11__sanitizer15internal_munmapEPvm>
dont_unmap:
// In the core dump we reach here after (I assume) calling unmap on the line above
ldr x8, [x19]
movi v0.2d, #0x0
mov w0, wzr
ldr x9, [x22]
// Here we try to dereference a pointer
// x8 = 0xffff91b7a080
// 0xffff91b7a080 - 0xffff91b7a000 = 0x80 so this is in unmapped memory
// Program terminated with signal SIGSEGV, Segmentation fault.
// #0 0x0000000000419744 in __xray::BufferQueue::releaseBuffer(__xray::BufferQueue::Buffer&) ()
ldr x8, [x8]
// This dmb is the result of merging the load atomic then store atomic calls.
// Load does a dmb after and store does one before.
dmb ish
str x8, [x9]
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzFWlt327gR_jXyC451SOpi6cEPtuNmfU42m9ppk_ZFByRBiWuSUAHQkvLr-82ApEjZ2U1jt9WRbVEE5j7fzICOdXq4_KLEKFp8yCtp9Chaio18UsIqVQm3ya1wyjqRybwQOkmkzXUli-IgdCV0bUSsHZZspBPSKGHqqsqrNd2Uwm7wVSpKmWzySo1HwbtRcOV_3zkBykltjKociKW5lXGBxZk24urKJJv5lITaOLe1o8nVKPoL3uvcbep4nOgSF0Xx1P453xr9u0ocLnGvzOmDyqZpGIbLJAkm8_hCZgv8DReLyYVSi4Vc4JtwIcOIFJZV-jZM51EyuchkJJNwmapEZhfhdDlNFyoI1CQM1SxVs2i5mBHTtFbCaTbx0DSj6KIohHXSOLEDc7Ej697RyuoRv5UAZxirhAlH0Y1IlYNzYDujpNVs_kI6Zcbig5YpXdKeRBuVYRkIrbUj7TzPedC8-fKT0WsjS_jclIgHB6osgs3X8Lp4uHv_cPv-78T1Qa1L-E46xAOioy5cq0U0CYQI9kH3mobLi-lU5JVYrfZGHoj55Oq6zjJl_lqrWvkvjCqggPLfww1_uLhdNSdTYjH-NOxn1zc-rKA2LJJSoIW05LO_DPYZXstwlmWQjSP_yycxixbz-YSo4D1796Jx_O_PAwf4FEl0qkg_zoOsrhKyyvdsLOCx5F5lN7quEDQLqDK-3TtIbK9l8giHPTg4i4z82Hz_kH_ja1rabFuOJtctPel0mScr67ctrs9Hk1syUEuWtjZrCoTEkCfdLFWpzWGlTarMSib_qnMiBFvctCwGr8Hyxm19iV62mxblQdQV9iC2Kw7MniFgw7pI8Q3ARSeIPI7aRkYRe8MIVlHEij7XFtEZH9go4jO5oVSyasBotwF8HXQtnDlQmoGrQsCoKlGip7zIndjqnD5hUV2VcrslyGIFB2n5ZXPwvk61shVy1AEnsRhAh2wlWV1espPuRCL9ArXfFjKvBnQ-KYN9tslJhI3dsOIUm9-PmBMPc9y_5Oflc1c3a-vTVf8Lnz_oxhuKq4PdwWItID3JAgios4Gf2ReU1zei0o5vcTQgndhkB78CzhqLX_ROPSkDe6fa2_sYXLwzZowAczhIVbpeb8jJVh48otIS4gZyG0SGOMHhG2WAq1VxOLpKWsr4QutH_M4fFQcYKYY4NorgsFWOg41iLeHYpqpmdakIwte8aydZCyGTRFlL9ZK2VWrvQBnFUnzZ5MmGqnAJphXQ26DWQMOClrtGYCbqxQUkbSHxQVD9KEWZW0u8eOGOafWyA-I1ku1yGKdiO8ZKfFMGIjkf6GzpgUnOz8_7l79qVp2qDzRE2u6GFrxXECqtEzYKjIaWAJIU5BjnyxC-obokuDBlRpeUjxCmcjX1GJTWiBqnfDluysuVeODaSHsrchFkb5uROi9SdCRtz3Hcck13qKiuFaoBoo5KOBd-tCwCYYu6-TsoblTyeM51p4tk7H5vZOxznHqhbVMkaXeitwcSmhyxY2NzolAzc9z-xeTAMzREicm3zocOUfNGg3mgHRUihLzyDgrngS8uiIm6jEnDjE3VhUqrITLFag93BeSANgxqwMVUcVmmGLJMcKfNI_LjWCpPUjZC_QvRxcR5Rb-l3fS9aZUT56r_DUU1A8xCAAb8jy-gfDvVHZD0ViajybuAluLDDT5DMH8VXeM9IICNPRp4QcKv91f_WP326fPdbx8fsHsURQjdWNvcHYgYrsUYwmepWeVV7lbksLEWBII9Qn-2BICiuoudRG_XXuSZj4HzDBuihOE4IgXgk2oAnCrZENHoAZHh9yQGJlXA5BDdkzpG_phIDLbu4cHwaL28-9gSZdv99Pv2zd6t4EeDPQuqK_EhR3fcoGiTsFxup01EIywYcEqsS3zQI1TJYKgxSBqFwSTXHESyAhim-VOeAiBo9jgGMxjd1z43mjzjDYTXwB_JWeHko2LW7Api7AGWM3J0cRtGAtUY0qUKlT1tJhkiqQlyPcMLC9R2kIHA1nflRymaFhEpTPBWqgHCEWDcQay6KVBd_mOxrxzGVynco4nIWlXGBIN3giI157nLbWgN19Qd4Q2VI4IQ5kxjGExkLUorN-5EyvcDlmFXYpBLnC-lVLnaKuu3EICNySy-iNBNkEMbg8riLQEGto793HOic68kQVxUsaNiNA6S7kz1qDXboOmWrR-GVOOa58XnWWDRcCaedJ4-76eHs8INos7o4hqdxCOUu7ph0ERLvXKi7ayby7a5RjBcD_ggb8bjMYV8_1vGA7BsOq5MuWSzgn1823VD_dlRsBuk9At9FzVTzBFoNnlHazr4K1K5N2CyX9BGTDb7KMBwIt721XKD3FZ03BqeGOb24duxcAUrtFv2WLyVWgPP3Pm-0s8LO2Q_xbKtqdvKarQ8j0ptaTagMMypr3JoYnI6lRiQGR4FpGhNCmqzxtKUzYlAqpO6G4Tpus6DRUBldIOfq_n0_J108vyzQfOFsDy_q6wztQ95LHj4_OHrfWueJK6-ic48wR6D81wFFHurf36c-3k4DHvxHU4GQ_Pt_ceHYDVvLlBOJ7PoNGQbwxynId8H5mShHX4q3wRSOev11ZiLmsAdEAN6ikLGqqClbsXrWmXicaVIGVbjYjJ9hRrTyRxqUP1nrvFYVkM5fiN03uVW9TrbRiPSxp-SdIPl9wbugZl8Q8kVqD162REqp401uAdji3D_xbe8tajI-A46htOTzbgb4lKzJYvsqdFi_05SOoJ4k7gntDCiR57zatIk8WS5mL4yxZ6xCXpcfpz0aTBuqSMn9PUOo6KfcJN0I9boOHMGTt35d62pTtCmFUO2Xz0-zSAvnA-9IHlNBi196PUtEBf-GuSDRHYJGoarlZVUpxFzYfheuU8QkwrM7VOXhYSBr7Lfn9v21Di90YmN_tANonQuon2fYKgp4VMVH9rsEZ0kaIao7h1LhJfdl4jgv1AiHPo-ZhH4EvHyukaVjwzqPOG2Uysdz3CnQympBxmpmoQkwjzMScMhhvXqoKt0QJrPWfmswY91IMfjlEpfqeAQF-evCc7pMvoOLiJNrY_SxlVR2Phq_pPiv6BJpdaeha_mrwayE_Iy9abu9yNvwmTg51-V5EZ1H_g5pEP19jyw6c26c-P4QgK0WyFL_eSF9OEavZER_sgGnHDTUz--rFTYKsX5fKrSNFjO2wUdDtPNnxa3asQNey77IXID8f_WVFj_UEn-IPM-LMcyeBmWZ3y-WMliVbIpbj89lR3A_yGDXodzAqh3VfeERaR1uSWsgNjJxp--yQwceajCFGrrki2coBthuGVVmyGTTv6EjPWTelZsu1Y5XL5pseAIzun6KRhHaRdfbxDHpymy4xTZfTOvJ90n35lo2dXT6Gf7kV8apH_hoYHszqb7G_aLZ9iwGA6ow1vi_ARHmu24Y_2zQCpHeXX6KGJA8k0e0vmp9v_1rO5HfdqF_eKt5t6B_vzUKC3j9qzVKEsnpjqD2c26PaKgBynNIYqfi_xU2XxDmWyHZqXnrr4RkUzdIwCBo9_Jt3RFD7IyXHftFa2lV243b6JrT-Fe39nZ9IUHnWfp5SRdTpbyzOWuUJdf7-VBZKk5N4pOoMbJdnv8fwD77B8CrgyNxO2jdPrXgLPaFJf_8QN1Po2i8Xg2WczmZ5tLlWZJlsp0pkKVxlEaJXIRTeNkOpWoYWl6xlOovYRaoyjiWGQUOMsvoyDCO5yFs-gimI3DYD6RCpunoKzm4WgaqBLKjEmEsTbrM3PJ0sT12uJmkVtnjzeB4MgypZgT6MvabbS5fCef8vRhmyePyrkzFv-SZf83vO94Mw">