[PATCH] D56754: Add Support for Creating and Deleting Unicode Files and Directories in Lit

Eli Friedman via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Feb 22 11:59:42 PST 2019


efriedma added inline comments.


================
Comment at: utils/lit/lit/util.py:440
+            # converted to UTF-16
+            return text.decode('utf-8') if isinstance(text, bytes) else text
+
----------------
jmittert wrote:
> efriedma wrote:
> > The inner "if" is redundant; on Python2, `bytes` is an alias for `str`, so both sides have the same meaning.
> > 
> > This function seems dangerous in the sense that it isn't clear what type of data it's expecting as input. Probably you should have separate functions for each of the possible cases: input is `bytes`, input is `str`, or input is Python2 `unicode`/Python3 `str`.  (Not sure which of those you actually need, but those are the reasonable possibilities, I think.)
> I'm fairly certain the if isn't redundant: on python 2, if we get `bytes`(/`str`) we want to decode that to a `unicode`. In python 2, `.decode` returns a `unicode`, not a `str` like in python 3.
> 
> Part of the reason it has confusing inputs is because what input it gets depends on they python version. It gets bytes/str on python 2 and str on python 3. I figured it was safer to handle all text cases (python 2 `bytes`/`str`/`unicode`, python 3 `str`/`bytes`). At least that way it's idempotent. I do agree it's dangerous in that the type it outputs is different based on the platform and version, but I've tried to limit use of it to only right at the edge before os calls.
> on python 2, if we get bytes(/str) we want to decode that to a unicode

I didn't mean it's a no-op, just that `isinstance(text, str)` and `isinstance(text, bytes)` always return the same result on Python2, so you don't need the explicit version_info check.

> I figured it was safer to handle all text cases

Making the function handle any string-like input is "safer" in the sense that you're less likely to get a runtime error in this function, but it makes the code harder to understand, and it's more likely to lead to confusing results if some other part of the code isn't handling strings consistently.


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D56754/new/

https://reviews.llvm.org/D56754





More information about the llvm-commits mailing list