<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/99003>99003</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Clang-format seems not to handle multibyte characters correctly.
</td>
</tr>
<tr>
<th>Labels</th>
<td>
clang-format
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
Dlinuigh
</td>
</tr>
</table>
<pre>
## Description:
I have encountered an issue with clang-format when processing lines containing multibyte characters, such as Chinese. Clang-format appears to count each multibyte character as multiple characters, leading to unnecessary line breaks even when the actual character count is within the specified Column Limit (80).
For example, consider the following line:
``` cpp
float ratio = 1.0f; // 计算完成area.w与h才能开始计算ratio,对于一些组件可能会延后。
```
When considering each Chinese character as a single character, this line contains fewer than 60 characters. However, clang-format breaks the line after the =, even with the following.clang-format configuration:
``` yaml
PenaltyExcessCharacter: 0
```
This configuration should prevent the line break in this case, leading me to believe that clang-format treats multibyte characters as multiple characters.
## Steps to Reproduce:
1. Use the provided.clang-format configuration.
2. Format a file containing the example line above.
## Expected Behavior:
clang-format should correctly handle multibyte characters and avoid unnecessary line breaks when the character count is within the Column Limit.
## Environment:
clang-format version: [18.1.8]
Operating System: [Arch Linux]
## Please let me know if you need any additional information.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJx8VU2P4zYM_TXKhVhDluPEPvgQ5wMtsEAX3RY9KzIdqytLhiQ7k39fSPbkYzq7wCATKOLj4-MjxZ2TF41Ykbwm-WHFR98ZWx2U1KO8dKuzaW4VYRlhGRzQCSsHL40m2Y7QA6G736HjEwJqYUbt0WIDXIN0bkS4St-BUFxfvrTG9tzDtUMNgzUCnZP6AkpqdCCM9lzqcNCPysvzzSOIjlsuPFpH2B7cKDrgDvZdiMAE9s-wfBiQWwfeQKQByEX3GVaAiMeD-phBIW8CA29g1BoDQ25vkSGcLfIfDnBCPZfgOwQu_MjVE_acWrpYt5wvuQGFbCU2sDdq7DV8lb30QFhRUMLKZBZx_jwZC_jG-0FhICSMdrJBG3Fao5S5vkt2V3_53ND5D8QwzEetMtyD5V4aINkB0oS2JKuBsBNhJyDHguyOZJeS4zZ8KbfkmIcvxZ4cN6QoSEm5RZ5cyXFN6oIUxy7-UJLiEIKLjNSHEFPvSUFj8JYU9f9xIwXC9vHKidRlBNyR4viOTJeTsg5BdU2KdTypSb0JUcUpBD5yrkPOcheTv98paQTMAlrBPqjyLNU_oXvvygY5o1MWV736hEOw6LNPQld8J91sisW1Dlq8xi5xDaEFd1cl8Ju54jTHvYzB4qfQ2AjFW7_0mWSHcHt2Whifl-YnLyjC6FZexqjwYyAfZrjxXs1n31Bz5W_Ht-Dq_b2cbAf0U63-CkW-wIPrzKgaGGyg5h_UYykQ3R5iuMPnYeoxzNMZlcQJg0T-VQhvkXv36dT_ZFRfJmbZS989DnH4_8TBmmYUHwckTeBvh5H0YM0kG2x-IeWSgiVwWtYLtFLh85oKSMuoLh08mwk_43Z8G1B4bKDGjk_S2Du1FwKLvMJYi8KrG3RcNwp_ooxugE9GNj_dVPcl9evt9LyTPiWvJ2mN7lH7D4oCwGsnJ7RutiGQvE6LJE0Kkh8et_8YMMirL_D95jz2y82dFR18lXp8u99-ofBNIXcICn0w0w9triBbuJkRNMa35ga8aWToG1cg9UwndHHVVFlTZiVfYZVuWZqmRb4pV12VlTRL0zNdM7rJz21W5jnmWV5uz6wom2a9khWjbE236YYWrGRlwrK0pAxzzjaloC0na4o9lypRauoTYy-r-OBVZUlptlL8jMrFB5WxZ5EIY-GJtVUI-3IeL46sqZLOuweQl15h9fK6OcTegTY-WPxXvri7J1mNVlWd94MLfYs7_yJ9N54TYXrCTiHd8u_LYM2_KDxhp1iEI-w01zFV7L8AAAD__0mRmWg">