[PATCH] D56754: Add Support for Creating and Deleting Unicode Files and Directories in Lit

Jason Mittertreiner via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Feb 22 14:58:36 PST 2019


jmittert marked an inline comment as done.
jmittert added inline comments.


================
Comment at: utils/lit/lit/util.py:440
+            # converted to UTF-16
+            return text.decode('utf-8') if isinstance(text, bytes) else text
+
----------------
efriedma wrote:
> jmittert wrote:
> > efriedma wrote:
> > > The inner "if" is redundant; on Python2, `bytes` is an alias for `str`, so both sides have the same meaning.
> > > 
> > > This function seems dangerous in the sense that it isn't clear what type of data it's expecting as input. Probably you should have separate functions for each of the possible cases: input is `bytes`, input is `str`, or input is Python2 `unicode`/Python3 `str`.  (Not sure which of those you actually need, but those are the reasonable possibilities, I think.)
> > I'm fairly certain the if isn't redundant: on python 2, if we get `bytes`(/`str`) we want to decode that to a `unicode`. In python 2, `.decode` returns a `unicode`, not a `str` like in python 3.
> > 
> > Part of the reason it has confusing inputs is because what input it gets depends on they python version. It gets bytes/str on python 2 and str on python 3. I figured it was safer to handle all text cases (python 2 `bytes`/`str`/`unicode`, python 3 `str`/`bytes`). At least that way it's idempotent. I do agree it's dangerous in that the type it outputs is different based on the platform and version, but I've tried to limit use of it to only right at the edge before os calls.
> > on python 2, if we get bytes(/str) we want to decode that to a unicode
> 
> I didn't mean it's a no-op, just that `isinstance(text, str)` and `isinstance(text, bytes)` always return the same result on Python2, so you don't need the explicit version_info check.
> 
> > I figured it was safer to handle all text cases
> 
> Making the function handle any string-like input is "safer" in the sense that you're less likely to get a runtime error in this function, but it makes the code harder to understand, and it's more likely to lead to confusing results if some other part of the code isn't handling strings consistently.
Ah, I see what you mean, yeah, that makes sense to do.

I've removed the convertToLocalEncoding and instead added to_unicode, which I think is significantly less confusing.


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D56754/new/

https://reviews.llvm.org/D56754





More information about the llvm-commits mailing list