[llvm] 1be024e - [Windows] Fix cmd line tokenization of unclosed quotes.

Simon Tatham via llvm-commits llvm-commits at lists.llvm.org
Tue May 3 03:58:20 PDT 2022


Author: Simon Tatham
Date: 2022-05-03T11:57:49+01:00
New Revision: 1be024ee450f2d3cb07086f6141d50f291c1910b

URL: https://github.com/llvm/llvm-project/commit/1be024ee450f2d3cb07086f6141d50f291c1910b
DIFF: https://github.com/llvm/llvm-project/commit/1be024ee450f2d3cb07086f6141d50f291c1910b.diff

LOG: [Windows] Fix cmd line tokenization of unclosed quotes.

When cl::TokenizeWindowsCommandLine received a command line with an
unterminated double-quoted string at the end, it would discard the
text within that string. That doesn't match the behavior of the
standard Windows C library, which will return the text in the unclosed
quoted string as an argv word.

Fixed, and added extra unit tests in that area.

In some cases (specifically the one in Bugzilla #47579) this could
cause TokenizeWindowsCommandLine to return a zero-length list of
arguments, leading to an array overrun at the call site in
windows::GetCommandLineArguments. Added a check there, for extra
safety: now windows::GetCommandLineArguments will return an error code
instead of failing an assertion.

(This change was written as part of https://reviews.llvm.org/D122914,
but split into a separate commit at the last minute at the code
reviewer's suggestion, because it's fixing an unrelated bug in the
same area. The rest of D122914 will follow in the next commit.)

Added: 
    

Modified: 
    llvm/lib/Support/CommandLine.cpp
    llvm/lib/Support/Windows/Process.inc
    llvm/unittests/Support/CommandLineTest.cpp

Removed: 
    


################################################################################
diff  --git a/llvm/lib/Support/CommandLine.cpp b/llvm/lib/Support/CommandLine.cpp
index 4c9250236a3b5..2f749bf7a6989 100644
--- a/llvm/lib/Support/CommandLine.cpp
+++ b/llvm/lib/Support/CommandLine.cpp
@@ -1008,7 +1008,7 @@ tokenizeWindowsCommandLineImpl(StringRef Src, StringSaver &Saver,
     }
   }
 
-  if (State == UNQUOTED)
+  if (State != INIT)
     AddToken(Saver.save(Token.str()));
 }
 

diff  --git a/llvm/lib/Support/Windows/Process.inc b/llvm/lib/Support/Windows/Process.inc
index dfaab1613de18..b1af298d9e83e 100644
--- a/llvm/lib/Support/Windows/Process.inc
+++ b/llvm/lib/Support/Windows/Process.inc
@@ -255,6 +255,9 @@ windows::GetCommandLineArguments(SmallVectorImpl<const char *> &Args,
       return EC;
   }
 
+  if (Args.size() == 0)
+    return std::make_error_code(std::errc::invalid_argument);
+
   SmallVector<char, MAX_PATH> Arg0(Args[0], Args[0] + strlen(Args[0]));
   SmallVector<char, MAX_PATH> Filename;
   sys::path::remove_filename(Arg0);

diff  --git a/llvm/unittests/Support/CommandLineTest.cpp b/llvm/unittests/Support/CommandLineTest.cpp
index dd02d92012652..7f751e5e101bd 100644
--- a/llvm/unittests/Support/CommandLineTest.cpp
+++ b/llvm/unittests/Support/CommandLineTest.cpp
@@ -237,12 +237,24 @@ TEST(CommandLineTest, TokenizeWindowsCommandLine2) {
 }
 
 TEST(CommandLineTest, TokenizeWindowsCommandLineQuotedLastArgument) {
+  // Whitespace at the end of the command line doesn't cause an empty last word
+  const char Input0[] = R"(a b c d )";
+  const char *const Output0[] = {"a", "b", "c", "d"};
+  testCommandLineTokenizer(cl::TokenizeWindowsCommandLine, Input0, Output0);
+
+  // But an explicit "" does
   const char Input1[] = R"(a b c d "")";
   const char *const Output1[] = {"a", "b", "c", "d", ""};
   testCommandLineTokenizer(cl::TokenizeWindowsCommandLine, Input1, Output1);
+
+  // An unterminated quoted string is also emitted as an argument word, empty
+  // or not
   const char Input2[] = R"(a b c d ")";
-  const char *const Output2[] = {"a", "b", "c", "d"};
+  const char *const Output2[] = {"a", "b", "c", "d", ""};
   testCommandLineTokenizer(cl::TokenizeWindowsCommandLine, Input2, Output2);
+  const char Input3[] = R"(a b c d "text)";
+  const char *const Output3[] = {"a", "b", "c", "d", "text"};
+  testCommandLineTokenizer(cl::TokenizeWindowsCommandLine, Input3, Output3);
 }
 
 TEST(CommandLineTest, TokenizeAndMarkEOLs) {


        


More information about the llvm-commits mailing list