[PATCH] D106792: [clang-tidy] Always open files using UTF-8 encoding

Andy Yankovsky via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Jul 26 07:37:53 PDT 2021


werat created this revision.
Herald added a subscriber: xazax.hun.
werat requested review of this revision.
Herald added a project: clang-tools-extra.
Herald added a subscriber: cfe-commits.

The encoding used for opening files depends on the OS and might be different
from UTF-8 (e.g. on Windows it can be CP-1252). The documentation files use
UTF-8 and might be incompatible with other encodings. For example, right now
`clang-tools-extra/docs/clang-tidy/checks/abseil-no-internal-dependencies.rst`
has non-ASCII quotes and running `add_new_check.py` fails on Windows, because
it tries to read the file with incompatible encoding.

Use `io.open` for compatibility with both Python 2 and Python 3.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D106792

Files:
  clang-tools-extra/clang-tidy/add_new_check.py
  clang-tools-extra/clang-tidy/rename_check.py


Index: clang-tools-extra/clang-tidy/rename_check.py
===================================================================
--- clang-tools-extra/clang-tidy/rename_check.py
+++ clang-tools-extra/clang-tidy/rename_check.py
@@ -10,9 +10,15 @@
 
 import argparse
 import glob
+import io
 import os
 import re
 
+# The documentation files are using UTF-8, however on Windows the default
+# encoding might be different (e.g. CP-1252). Force UTF-8 for all files.
+def open(*args, **kwargs):
+  kwargs.setdefault("encoding", "utf8")
+  return io.open(*args, **kwargs)
 
 def replaceInFileRegex(fileName, sFrom, sTo):
   if sFrom == sTo:
Index: clang-tools-extra/clang-tidy/add_new_check.py
===================================================================
--- clang-tools-extra/clang-tidy/add_new_check.py
+++ clang-tools-extra/clang-tidy/add_new_check.py
@@ -11,10 +11,16 @@
 from __future__ import print_function
 
 import argparse
+import io
 import os
 import re
 import sys
 
+# The documentation files are using UTF-8, however on Windows the default
+# encoding might be different (e.g. CP-1252). Force UTF-8 for all files.
+def open(*args, **kwargs):
+  kwargs.setdefault("encoding", "utf8")
+  return io.open(*args, **kwargs)
 
 # Adapts the module's CMakelist file. Returns 'True' if it could add a new
 # entry and 'False' if the entry already existed.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D106792.361652.patch
Type: text/x-patch
Size: 1354 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20210726/d50c5cab/attachment-0001.bin>


More information about the cfe-commits mailing list