<html>

    <head>

      <base href="https://bugs.llvm.org/">

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - non-ascii source files cannot be shown in scan-view"

   href="https://bugs.llvm.org/show_bug.cgi?id=40765">40765</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>non-ascii source files cannot be shown in scan-view

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>clang

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>Linux

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>enhancement

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Static Analyzer

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>dcoughlin@apple.com

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>johannes@sipsolutions.net

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>dcoughlin@apple.com, llvm-bugs@lists.llvm.org

          </td>

        </tr></table>

      <p>

        <div>

        <pre>We just get:

INTERNAL ERROR

Traceback (most recent call last):

  File "/usr/share/clang/scan-view-9/share/ScanView.py", line 232, in do_GET

    SimpleHTTPRequestHandler.do_GET(self)

  File "/usr/lib/python2.7/SimpleHTTPServer.py", line 45, in do_GET

    f = self.send_head()

  File "/usr/share/clang/scan-view-9/share/ScanView.py", line 712, in send_head

    return self.send_path(path)

  File "/usr/share/clang/scan-view-9/share/ScanView.py", line 727, in send_path

    return self.send_patched_file(path, ctype)

  File "/usr/share/clang/scan-view-9/share/ScanView.py", line 774, in

send_patched_file

    return self.send_string(data, ctype, mtime=fs.st_mtime)

  File "/usr/share/clang/scan-view-9/share/ScanView.py", line 747, in

send_string

    encoded_s = s.encode()

UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position 111162:

ordinal not in range(128)

In ScanView.py line 747 we have:

 encoded_s = s.encode()

changing that to just

 encoded_s = s

appears to work around the problem.

It's not clear what _should_ be done about this though. Clearly, C source files

can be any sort of encoding, in particular in comments, and we can't really

know which it is. Most files we have are UTF-8, but some older ones are

ISO-8859-1 or similar encodings, depending on whatever the author wrote ... I

guess ideally it's just passed through more or less, and then worst case some

stuff shows up as garbage in the browser, still better than crashing.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>