<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/66106>66106</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Assert when passing UTF-16 (LE) file to -dump-raw-tokens
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          mattmanj17
      </td>
    </tr>
</table>

<pre>
    ### I am aware that clang only supports UTF-8 (with or without BOM). I _know_ this file type in unsupported.

If you pass a file containing nothing but a UTF-16 (LE) byte order mark (that is, the bytes "\xFF\xFE") to -dump-raw-tokens, you get the following output, with an assert:

```
fatal error: UTF-16 (LE) byte order mark detected in 'C:\test.c', but encoding is not supported
"<<" <invalid>
"<" <invalid>
Assertion failed: 0 && "Invalid SLocOffset or bad function choice", file C:\Users\drape\OneDrive\Desktop\llvm\clang\lib\Basic\SourceManager.cpp, line 868
```

The current behavior (asserting) seems wrong. We should probably bail out of DumpRawTokensAction::ExecuteAction if we get a fake buffer back from SM.getBufferOrFake. If we did that, the output would just be:

```
fatal error: UTF-16 (LE) byte order mark detected in 'C:\test.c', but encoding is not supported
```

For those who do not remember off the top of their head, the invocation to produce this bug would look something like:

```
clang -cc1 -dump-raw-tokens C:\test.c
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzMVE1v2zgQ_TX0ZWDDpixZPvjgyDEQoEWAbYs9FiNyJDGmSIEfUf3vF6TS3c02QK8LEKTp0ZDvvRk-9F71hujEygdWXlYYw2DdacQQRjQvu8OqtfJ-YrxYBjwBjoAzOoIwYACh0fRgjb6Dj9NkXfDw7et1XQPj9azCANZBWm0M8PD8mfHjBp7g-83Y-TuEQXnolCYI94lAGYjm7RiSG7a9sO15mZ86uNsIE3oPuKQIawIqo0wPxoYhrW0MgPn-XZUAfHpk_AjtPRBYJ8nBiO6WAhm78ow3EAbKX3hgnLOy-XG95vkxbfkRgoW1jOO0djivg72RyWkJTU8hp3dWazsnADaGKYYUz9zRAHpPLrDi_G82rNq-jbztMKAGcs46Vpx_i19SIBFIJr0YPzTp8LIJ5MNGMH5ItychyAgrEyjlk0Dwt7BvGDhnRZMG58CKRplX1Eqy4vFd_OPgObNS1kCHSpNMsLfAeMV4As6flu_hyycrnrvOU0h90KKELhqRE8VglaAscrMU9I3IN0_Os7KRDidiZfNs6OLUa_p5IX8LdmJlo_XryMomt1_aqpaVzQN6JVjZfLHRCfqMBntyGzFN6QqtDEFd1R9WYJm_DgQiOkcmQEsDvirrUhmWGirTp2p4otHD7KzpN_AngR9s1BImZ1ts9R1aVDr1AdgOLnGc_sD5a-6ac-adOBbnxx8kYqDlL1AdzJS7CaHDG0Ebu46SXuIGnbMjfPm86Sk85L-f3RVvtIGnnCWVzC_xZysvHQhzBvUSfWLyf-y-j_S_WgdhsJ5gHixImxMdjTS25MB2XWYY7JS0DQMpBwOh_EldmVcrMCsabCqIjIIWj2lj_yaJtvYG3o60OIZWt9_IszjcWojdLz4A78m_z17JUyGPxRFXdNpVx31ZVPtjvRpOh_pQY1nWOymwRr4vtnuUnGqxFYdyh7hSJ77lxfa447uq2Jf1RuKuloe22u-7sqOqY_stjaj0Jr2CjXX9Snkf6VRVu2210tiS9tnROTc0Qw5mc7us3CnlrNvYe7bfauWD_-eUoIKm0_K2YR7IZL9NMv23KRbP_tUZV9Hp0xDC5JMy_Mr4tVdhiO1G2JHxa362y7KenH0hERi_ZoCe8Wsm8FcAAAD__wyhEys">