<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/87885>87885</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[clang-format] Lines with UTF-8 characters are wrongly aligned
</td>
</tr>
<tr>
<th>Labels</th>
<td>
clang-format
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
MaJerle
</td>
</tr>
</table>
<pre>
Consider this code:
```c
#define SENSOR_DESC_1 \
"{" \
" \"~\": \"homeassistant/mycustomsensorT\"," \
" \"unit_of_measurement\": \"°C\"," \ \
"}"
```
Because of `°` character, which is `0xB0` and thus UTF-8 encoded, clang-format seems to only count bytes and not construct UTF-8 encoded length.
The same happens if you use other UTF-8 characters, like `čšž`, which are not part of ASCII.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy0VEuP2yAQ_jX4ghJh_Ij34EPibKSt-pCa7TnCeGxoMUQM7jaX_vYKJ7vNbh-nXcsaxDB8fN8wjEDUgwWoSbEhxTYRU1DO1x_EO_AGktZ1p7pxFnUHngalkUrXAcnWhG0Je7QlO__yMudZB722QPe3H_efPh-2t_vmkNI3_0jRnBnME87JakM4f_tz_8_i0cf5z_NAsvXFodwI8QowCBsI340nOWFwI4JF5-8v4bx5JRX_ZjZZHQ6uP0Q6k4cRbHjJlTScbFjz2pyuuL0m3F_gX5TGNtrn9Xtd1BuQYkKgrqdxZRZPSkalEl7IAJ7whj4oLRXVGEPYj3OAsB0NakL65X63qCjY-GK6GC2NsMOid34UgSLAiDQ46qw5UekmG2h7CoAzgHWBSmcx-EmG50jUgB2CWl6TvVdAUYxAlTgewSLVPT25ic4CggJ_gXgij5GP0d_gLC4n1ZY0BVmn0W5uo_NJnvAw8zkKH2I61vvm7m6ZdHXW3WQ3IoE6XaU852WRZomqIWVt2fGyKguWciHKArqqyMoWsqztmUh0zRnPWc7KtGJpkS2rnFf9SvKcib6UVUVyBqPQZmnM93Hp_JBoxAnqalVVRWJECwbnhsX5dUrjfRbbxNdx26KdBiQ5MxoD_gYKOpi52T3bWGzpe20B6YMO6o9UzQl48M4O5kSFif2ySyZvahXCEWMv5DvCd4MOamqX0o2E7-KBl2Fx9O4ryPi8ZxlI-G5W8isAAP__59plQQ">