[PATCH] D67643: [lit] Extend internal diff to support `-` argument
Joel E. Denny via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Sep 20 06:56:23 PDT 2019
jdenny marked an inline comment as done.
jdenny added inline comments.
================
Comment at: llvm/utils/lit/lit/builtin_commands/diff.py:27-29
+ # FIXME: How can we restart stdin if the encoding is wrong? How can we
+ # read stdin with a different encoding or in binary mode in a way that's
+ # compatible with python 2 and 3?
----------------
rnk wrote:
> This function seems fragile to me. It reads each file three times:
> 1. without an encoding (system default?), may fail
> 2. with a utf-8 encoding, may fail
> 3. inside compareTwo*Files, reads them yet again
>
> I think the best way to be portable between Python 2 & 3 will probably be to work in bytes as much as possible.
>
> I think you can use this pattern to get a new file descriptor for stdin that reads bytes:
> ```
> fileno = sys.stdin.fileno()
> if fileno is not None:
> new_stdin = os.fdopen(os.dup(fileno), 'rb') # read in binary
> ```
>
> Here's how I would do it:
> 1. read both inputs completely in binary (can't fail)
> 2. try decoding the entire file in a few encodings (try default, try utf-8, if that fails, stick with bytes and diff_bytes)
> 3. split file on u'\n' or b'\n' as appropriate, perhaps stripping trailing '\r' if present, depending on flags
>
> Does that seem reasonable?
I don't know the motivation by the original implementation. Your plan sounds better. Sorry, but I don't have time to implement and test this right now. Hopefully soon. Thanks for the review.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D67643/new/
https://reviews.llvm.org/D67643
More information about the llvm-commits
mailing list