[PATCH] D99406: RFC [lit] Detect if processes to execute are MSys based, apply custom quoting logic

Martin Storsjö via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Apr 5 13:53:37 PDT 2021


mstorsjo added inline comments.


================
Comment at: llvm/utils/lit/lit/TestRunner.py:221-226
+    Unfortunately, it seems to be impossible to quote complex arguments in a
+    way that both the MSVC CRT and MSys would interpret the same way
+    consistently. Quoting arguments affect how MSys interpret certain chars
+    like backslashes. One can also set the env var MSYS=noglob to inhibit
+    the globbing process (which affects how chars are quoted), but that
+    breaks a few cases where the globbing handling is required:
----------------
rnk wrote:
> I want to try and find a way to challenge this finding, it's not consistent with what I found when I looked into this years ago, and, really, I just don't want it to be true, for obvious reasons. :)
> 
> What cases require msys globbing? I thought the whole reason that we implemented globbing in the lit shell was because regular win32 tools don't implement globbing.
> 
> Is MSys glob support something new, something that came after D26009 (2016)?
I think you might just have been lucky. Given all the funky char strings used in test command lines, all of them used right now happen to work, so apparently these cases are pretty rare. The one exception seems to be the case in clang-tidy (D99330), where the `[` in the argument later breaks `\` - and that's just luckily/accidentally masked by being passed via `not.exe` (with different quoting rules) right now.

It's not so much that anything _requires_ msys globbing - it's just that certain characters trigger that msys codepath. In traditional win32, a wildcard argument doesn't get expanded by the caller but is supposed to be expanded by the callee, but I guess msys steps in and tries to expand these before the application sees them.

https://github.com/msys2/msys2-runtime/blob/msys2-3_2_0-release/winsup/cygwin/dcrt0.cc#L208 is where this happens; any of the chars `?*[\"\'(){}` trigger this codepath, and this codepath seems to be mangling some cases of e.g. single backslashes (see all the cases of quoted arguments below, where some of them lose backslashes; either losing a single backslash entirely, or reducing a series of multiple backslashes into fewer).


================
Comment at: llvm/utils/lit/lit/TestRunner.py:232
+    2 a\b\\c\\\\d     a\b\\c\\\\d    a\b\\c\\\\d        a\b\\c\\\\d
+    3 a\"b            a\b            a\b                a"b
+    4 "a[b\c"         a[b\c          a[b\c              a[b\c
----------------
rnk wrote:
> I think this is the case I encountered when I added this custom quoting logic. The original needquote condition is:
> https://github.com/python/cpython/blob/master/Lib/subprocess.py#L568
>         needquote = (" " in arg) or ("\t" in arg) or not arg
> 
> I added a check for `"`, so that in this case, the input will be `"a\"b"`, and both an msys-no-glob and win32 app will see `a"b`.
Yup. And my attempt to fix this by using `MSYS=noglob` surfaced this issue as it broke again.


================
Comment at: llvm/utils/lit/lit/TestRunner.py:235
+    5 "a\b\\c\\\\d"   a\b\c\\d       a\b\\c\\\\d        a\b\\c\\\\d
+    6 "a\"b"          a"b            a\b                a"b
+
----------------
rnk wrote:
> Are you sure the msys-noglob case here is correct? That seems wrong...
> 
> I have always believed that the correct, safe way to quote anything on Windows is to:
> - surround each argument in double quotes
> - backslash escape every `\` and `"` character
> 
> Even though this escapes strictly more than is necessary, you usually get the same results. This is for example the algorithm used to produce clang crash reproducer scripts, which is nice because they usually run in both cmd and bash.
> 
> Can we file a bug against msys about this? I feel like the app really should see `a"b` in this case.
I guess the big question is "what is correct"; it's clearly not what you'd expect to be interpreted given the regular win32 arg quoting rules, no. But then again, msys executables also live in a world with lots of other wild requirements with how things are supposed to behave when invoked from various contexts, so it might come from handling some other msys specific usecase of expanding arguments in a more unix-shell-y way. So not sure if it's intentionally this way, or just ends up this way while trying to fulfill all other requirements.

Your generic windows quoting algorithm is off a bit, as you're only supposed to backslash escape `"` and backslashes directly preceding a `"`, but other backslashes are supposed to stay single.

I guess I could file a bug... One problem is where to aim it, as msys is mainly a friendly fork/repackaging of cygwin. I dunno how much it differs from upstream cygwin (I'll need to get a fresh installation of that.) But I guess I could file a bug there, discuss what's expected and what's not, and if possible continue further upstream.

Secondly, which ones of the behaviours should we report as buggy? I guess we could report all of the combinations that do differ from the regular win32 interpretation. If all of them would end up fixed, we wouldn't need to care about anything else than proper win32 quoting. But even by fixing a couple of them, we could find one setup (either quoting just what we need, or quoting excessively, and with or without setting the noglob setting) where things work consistently enough for us to rely on.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99406/new/

https://reviews.llvm.org/D99406



More information about the llvm-commits mailing list