[PATCH] D43165: [lit] Fix problem in how Python versions open files with different encodings

Stella Stamenova via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Mar 9 09:32:42 PST 2018


stella.stamenova added a comment.

I don't think we should compare all files as binary. The code has input options to specify how whitespace should be treated and if we compared all files as binary, we would explicitly be making a change which breaks that functionality.

I like the idea of deciding in `compareDirectoryTrees` whether to compare the files as binary or text - we would have to add a second comparison function, but the code would be much easier to read. Since `mimetypes.guess_type(file)` decides the type of file based on the extension, I don't think we can rely on it as a lot of the files we are comparing have interesting extensions.

We could do something like http://code.activestate.com/recipes/173220/ as Zachary suggested or we could try to open the file as text (open(file, 'r'), then read lines) and if that fails, treat the file as binary. The first has the benefit of not having to open each file an extra time but I am not sure how reliable the logic is, the second will always tell us whether in our version of python we can treat the file as text, but it does require us to open the file and read it an extra time.


Repository:
  rL LLVM

https://reviews.llvm.org/D43165





More information about the llvm-commits mailing list