[PATCH] D99406: RFC [lit] Detect if processes to execute are MSys based, apply custom quoting logic

Reid Kleckner via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Apr 5 15:12:34 PDT 2021


rnk added inline comments.


================
Comment at: llvm/utils/lit/lit/TestRunner.py:235
+    5 "a\b\\c\\\\d"   a\b\c\\d       a\b\\c\\\\d        a\b\\c\\\\d
+    6 "a\"b"          a"b            a\b                a"b
+
----------------
mstorsjo wrote:
> rnk wrote:
> > Are you sure the msys-noglob case here is correct? That seems wrong...
> > 
> > I have always believed that the correct, safe way to quote anything on Windows is to:
> > - surround each argument in double quotes
> > - backslash escape every `\` and `"` character
> > 
> > Even though this escapes strictly more than is necessary, you usually get the same results. This is for example the algorithm used to produce clang crash reproducer scripts, which is nice because they usually run in both cmd and bash.
> > 
> > Can we file a bug against msys about this? I feel like the app really should see `a"b` in this case.
> I guess the big question is "what is correct"; it's clearly not what you'd expect to be interpreted given the regular win32 arg quoting rules, no. But then again, msys executables also live in a world with lots of other wild requirements with how things are supposed to behave when invoked from various contexts, so it might come from handling some other msys specific usecase of expanding arguments in a more unix-shell-y way. So not sure if it's intentionally this way, or just ends up this way while trying to fulfill all other requirements.
> 
> Your generic windows quoting algorithm is off a bit, as you're only supposed to backslash escape `"` and backslashes directly preceding a `"`, but other backslashes are supposed to stay single.
> 
> I guess I could file a bug... One problem is where to aim it, as msys is mainly a friendly fork/repackaging of cygwin. I dunno how much it differs from upstream cygwin (I'll need to get a fresh installation of that.) But I guess I could file a bug there, discuss what's expected and what's not, and if possible continue further upstream.
> 
> Secondly, which ones of the behaviours should we report as buggy? I guess we could report all of the combinations that do differ from the regular win32 interpretation. If all of them would end up fixed, we wouldn't need to care about anything else than proper win32 quoting. But even by fixing a couple of them, we could find one setup (either quoting just what we need, or quoting excessively, and with or without setting the noglob setting) where things work consistently enough for us to rely on.
Well, the generic algorithm is meant to work with a wide variety of Unix-y and Windows-y consumers, so it doubles all backslashes. It sounds like the only cases where it doesn't work are either case 6 or related to glob characters.

As for filing a bug, I'm really focused on case 6: if the command line uses double quotes and globbing is off, MSys should try to strictly follow the unquoting rules. Otherwise, what is the correct way to pass arguments with double quotes to an msys-noglob program?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99406/new/

https://reviews.llvm.org/D99406



More information about the llvm-commits mailing list