[PATCH] D78346: Fix Windows command line bug when last token in response file is ""

Tue May 26 11:58:26 PDT 2020

amccarth requested changes to this revision.
amccarth added a comment.
This revision now requires changes to proceed.

Thanks for extending the tests.  Those will certainly make your fix more robust against regressions.

I'm concerned about the extra complexity added to the state machine and the introduction of an unused API.  The new version has more complicated flow control for no obvious benefit, and it seems unrelated to the relatively simple fix this patch started out to be.  If there are good reasons for these additional changes, perhaps it would be best to break them into a separate follow-on patch.  I'd be happy to review that as well.

================
Comment at: llvm/lib/Support/CommandLine.cpp:928
 // Windows tokenization implementation. The implementation is designed to be
 // inlined and specialized for the two user entry points.
 static inline void
----------------
Looks like we have some scope-creep:  This patch is now about more than solving the problem with arguments at the end of the line.

I'm not convinced that it's important this be inlined.  That's ultimately up to the compiler.

I think the comment about the user entry points doesn't tell us anything that we cannot see in the function signature.

================
Comment at: llvm/lib/Support/CommandLine.cpp:935

   // Try to do as much work inside the state machine as possible.
   enum { INIT, UNQUOTED, QUOTED } State = INIT;
----------------
I don't understand what I'm supposed to take away from this comment.  Either delete it or explain _why_ it's important to do as much work as possible.

I'm not even sure what you mean by "work."  It seems this version of the state machine does the same amount of work as the old version.  Is this about the fact that the states now have their own internal loops in addition to the outer loop?

================
Comment at: llvm/lib/Support/CommandLine.cpp:945
           MarkEOL();
         ++I;
       }
----------------
I'm concerned that the index is advanced here and and in the loop header.  I'm not saying it's wrong.  It's just harder to reason about state machine behavior when it's a hybrid of a classic loop surrounding state handlers that have their own internal state machines.  Now we have a bunch of checks throughout the body to see if we've hit (or exceeded?!) the end.  Is there an advantage to this that I'm not seeing?

================
Comment at: llvm/lib/Support/CommandLine.cpp:959
         // token. Copy the string if the caller asks us to.
         AddToken(AlwaysCopy ? Saver.save(NormalChars) : NormalChars);
       } else if (Src[I] == '\"') {
----------------
This seems like a behavior change:  AddToken will now be called _after_ MarkEOL.  Maybe that doesn't matter, but it seems nonsensical to report the delimiter before the token it delimits.

================
Comment at: llvm/lib/Support/CommandLine.cpp:960
         AddToken(AlwaysCopy ? Saver.save(NormalChars) : NormalChars);
       } else if (Src[I] == '\"') {
         Token += NormalChars;
----------------
Consistent with [[ https://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code
 | LLVM style ]], the old version used `continue` to avoid cascades of `else if` statements.  Please follow that pattern here.

================
Comment at: llvm/lib/Support/CommandLine.cpp:1028

 void cl::TokenizeWindowsCommandLineNoCopy(StringRef Src, StringSaver &Saver,
                                           SmallVectorImpl<StringRef> &NewArgv) {
----------------
Nobody is calling this new function, not even in the tests.  Is the intent to replace the original function with this new one to reduce copying?  Do we really need the flexibility to parse it both ways, or can we keep things a bit simpler?

================
Comment at: llvm/lib/Support/CommandLine.cpp:1611
     unsigned ValNo = 0;
     for (size_t J = 0, E = PositionalOpts.size(); J != E; ++J)
       if (RequiresValue(PositionalOpts[J])) {
----------------
Did you intentionally change `J` to start at 0 rather than 1?  This mostly looks like you were just trying to make the style consistent by capitalizing these variable names.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78346/new/

https://reviews.llvm.org/D78346