<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - non-ascii source files cannot be shown in scan-view"
   href="https://bugs.llvm.org/show_bug.cgi?id=40765">40765</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>non-ascii source files cannot be shown in scan-view
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>clang
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Static Analyzer
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>dcoughlin@apple.com
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>johannes@sipsolutions.net
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>dcoughlin@apple.com, llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>We just get:

INTERNAL ERROR

Traceback (most recent call last):
  File "/usr/share/clang/scan-view-9/share/ScanView.py", line 232, in do_GET
    SimpleHTTPRequestHandler.do_GET(self)
  File "/usr/lib/python2.7/SimpleHTTPServer.py", line 45, in do_GET
    f = self.send_head()
  File "/usr/share/clang/scan-view-9/share/ScanView.py", line 712, in send_head
    return self.send_path(path)
  File "/usr/share/clang/scan-view-9/share/ScanView.py", line 727, in send_path
    return self.send_patched_file(path, ctype)
  File "/usr/share/clang/scan-view-9/share/ScanView.py", line 774, in
send_patched_file
    return self.send_string(data, ctype, mtime=fs.st_mtime)
  File "/usr/share/clang/scan-view-9/share/ScanView.py", line 747, in
send_string
    encoded_s = s.encode()
UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position 111162:
ordinal not in range(128)


In ScanView.py line 747 we have:

 encoded_s = s.encode()

changing that to just

 encoded_s = s

appears to work around the problem.

It's not clear what _should_ be done about this though. Clearly, C source files
can be any sort of encoding, in particular in comments, and we can't really
know which it is. Most files we have are UTF-8, but some older ones are
ISO-8859-1 or similar encodings, depending on whatever the author wrote ... I
guess ideally it's just passed through more or less, and then worst case some
stuff shows up as garbage in the browser, still better than crashing.</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>