<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/55864>55864</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Non-UTF8 output with `-fsanitize=undefined`
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
tgeorg-ethz
</td>
</tr>
</table>
<pre>
For a university project where we test the reliability of various sanatizers in clang we came across a case in which the _UndefinedBehaviorSanitizer_ produces non-UTF8 output. In particular when compiling a specific C source file with `-fsanitize=undefined` and then running it, the output produced by the sanatizer contained non-UTF8 characters.
Screenshot of the error in action:
![screenshot](https://user-images.githubusercontent.com/55834264/171940991-536f9213-3284-4248-b43c-ebbcc3cb2b33.png)
Newest version we found to be affected:
```
Ubuntu clang version 15.0.0-++20220530052901+b7d2b160c3ba-1~exp1~20220530172952.268
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
```
<details>
<summary>Older versions also affected</summary>
```
clang version 10.0.0-4ubuntu1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
```
and
```
clang version 13.0.0 (Fedora 13.0.0-3.fc35)
Target: x86_64-redhat-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
```
and
```
clang version 14.0.0 (Fedora 14.0.0-1.fc36)
Target: x86_64-redhat-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
```
</details>
The used C source code:
```c
#include <stdint.h>
uint64_t a = 4073709551615;
static uint64_t b[2][2][9] = {&a, &a};
int16_t c;
uint8_t d;
uint64_t e;
int32_t f = 5;
int16_t g(int16_t, uint8_t *, uint32_t, int8_t, int64_t);
uint16_t i() { int32_t j = g(0, d, j, e, 0); }
int16_t g(int16_t k, uint8_t *l, uint32_t m, int8_t n, int64_t o) {
for (;; c = c + 1)
b[k][f][c] = &a;
}
void main() { i(); }
```
Compiled with the command `clang -fsanitize=undefined test.c`
We also found that simply changing the value of the global variable `a` from
```c
uint64_t a = 4073709551615;
```
to
```c
uint64_t a = 1;
```
causes the non-UTF8 output to no longer show up:
![screenshot](https://user-images.githubusercontent.com/55834264/171941059-c3122192-e6ce-4471-a778-0d8fec5cef5a.png)
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzNVktz2zgM_jXyhSONRL0PPuTRzPTSPbSdPWYokpKY0qSHDyfpr1-Qkh3HTbo5bdfjISkCBD6AIIBBs-ftnTaIIK_EgRsr3DPaG_3AqUOPMzccPXLkuHXIzRwZLgUZhAxcekQHYoT2FlmiiBM_4TgSClFJ1BSOUbLjiFCjrQUFlFgeyI-zoHOUdv9dMT4Kxdk1n8lBaPOVKBEF3QcQzFNukdIq_f7trkPau713Gfqs0J4YJ6iXxASQoFLv9oAK1BJk95yKUVB0g6z2hnI0CglmCDejpMnT0a5KkvLWHwEAARHFAiyFjFcqyBIuwTcR6aL6iImh4Tlun-wGAMqRIOgFLp2JIdSBU7Ikv03yq2X8Sg3nys7aBQ8GKdwYuAHwDHALrZLy6pw_wUVSX9vTqaS-TXA3O7e3gRPfwd9bblKxIxO32QR2-iHsBExcuQycAzx13ZUVbipYFm3RV3nfF2ldNmOPizItcVelFa66dKhKmvJhoLSkAx7KMturKcH9OaYv_DFERIwXrcJVj9oH72k0wI2PI0QP-PTCkCZf__Hz--CV82uwHCUVdZZneZrga_jjHOO8LvO8xn1ewMbQMjwUTU7LgaRF0n7iT_swHRmLFvc1znDTLSq-ETNxBzDQU9fcN1W6pykEiX9KJ-VXltlwwtBOMy4D415b8bSQPivriJSc3QoTSNHPBsZBqDctejWWN4xDREi4pE8XBOt3O2KegfCXZBA7q_HwRqTVZ967AV0vvL9x5YUP8-jDykcHF-iP-ALe0keglgEqSOvuONOGrBtpmY20rE9B9wt2w9lM3P8Af3WJP26kRcDf_En8b0Yk8L8TlKt-jiBvsJfESQHKe6-YHtNTKRSVnnEUQtsxARlnPsn38NlU9w7yMqRbVOVt2eZ9XRdNUSfl9cIEtkE2RyfeARIeDnnuNPUwRQFJC5mhISEvx7m9PUmB00UDp-lpJwjsYIe92okq-PmxEsPOGOXXv4iD3Net66D1KDPBV8fPcDysF8q6ClpCAJxrjvIEyANCsAQddT9E3UFTHo6zMDyEgYchX-SgYOx70NCPC3DyHB3aveBD6gwh0iuURTCCPG5COAfYoJFGXDDia1ScwhnBL1zRj-VuxmWipysK93I0-4T5oAWENpTIc_OX9Svb3ozjm1jeITJjEQ81E2raLtTrEIjxSb5d1mPnktE3n8XffEm4a-mCB4ms2O3lc6jcagoNQNB0INLzY6mepB6IjH0PGaCpAMEktA6jgRr7m1fyoWdwYbrTH5JVvHeeEnjMNsK-aKFCnVYaSa0mKD_QVDwiv_-P2o4ir_uUlgXGRY9T3lCeVlVbpKRtuzRnHRS_mvKxJmvbsWHbkvVlTzZOOMm3Xy5M-de2buON3L7GvWBdAUp5OE7p2vjCp7DWc7vgb6rNvG1o03VNU7Cetk3b0WEATIzVIx5LnFfDRpKBS7sFpyUYK_6IoghYg-c2Yht7lCYvi77uqiobq7xuuzqvoLeBtqtOqpzD65BZwJFpM23MNkIa_GSBKIV19oVIrBWT4jyqA_nEu1mbrZs4UFPu5p-bqH4b4f8DQY57FQ">