[PATCH] D30773: Make git-clang-format python 3 compatible

Eric Fiselier via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Mar 10 21:27:45 PST 2017


EricWF marked 4 inline comments as done.
EricWF added inline comments.


================
Comment at: tools/clang-format/git-clang-format:293
 
+def to_bytes(str):
+    # Encode to UTF-8 to get binary data.
----------------
These functions are all lifted directly from `lit/util.py`


================
Comment at: tools/clang-format/git-clang-format:302
+        return bytes
+    return to_bytes(bytes)
+
----------------
mgorny wrote:
> Shouldn't this be:
> 
>     return bytes.decode('utf-8')
> 
> ?
> 
> Otherwise, unless I'm missing something this function will always return the parameter [with no modifications], either in the first conditional if it's of `str` type, or in the conditional inside `to_bytes()` if it's of `bytes` type.
Nope. So in python3 this line will never be hit. but it will be in python2.

Specifically `convert_string` calls `to_string(bytes.decode('utf-8'))`. Which in python2 calls `to_string` with the type `unicode`. We need to de-code the unicode into a string using `bytes.encode('utf-8')`. 

I agree it's a bit misleading that it's calling a function called `to_bytes` though.


================
Comment at: tools/clang-format/git-clang-format:306
+    try:
+        return to_string(bytes.decode('utf-8'))
+    except AttributeError: # 'str' object has no attribute 'decode'.
----------------
mgorny wrote:
> This logic looks really weird to me. What is the purpose of having both `to_string()` and `convert_string()`? Why do `to_bytes()` and `to_string()` use `isinstance()` to recognize types, and here you rely on exceptions? Why is `to_string()` called after decoding?
`to_string` is called after decoding because in python2 the result of decoding is a `unicode` type, and we need to encode it a `str` type. Hense to_string.


================
Comment at: tools/clang-format/git-clang-format:323
   for line in patch_file:
-    match = re.search(r'^\+\+\+\ [^/]+/(.*)', line)
+    match = re.search(to_bytes(r'^\+\+\+\ [^/]+/(.*)'), line)
     if match:
----------------
mgorny wrote:
> Any reason not to use:
> 
>     br'^\+...'
> 
> ? i.e. make it bytes immediately instead of converting.
I tried `rb'blah'` and that didn't work in python2. However this has since been removed.


https://reviews.llvm.org/D30773





More information about the llvm-commits mailing list