[cfe-commits] [patch] Implement support for @file

Daniel Dunbar daniel at zuster.org
Thu Jul 15 14:59:34 PDT 2010


On Sat, Jun 26, 2010 at 1:34 PM, Rafael Espindola <espindola at google.com> wrote:
> On 24 June 2010 04:17, Daniel Dunbar <daniel at zuster.org> wrote:
>> Hi Rafael,
>>
>> I would like to support @filename syntax, but I think it would be
>> better to implement as a built in feature of the OptTable class. My
>> recent work to have arguments embed their values should make this
>> easier. I think it should be possible to implement this primarily
>> inside OptTable::ParseArgs (which has a FIXME about it, btw).
>
> I missed that :-(
>
> An updated patch is attached. This one is implemented in the OptTable.
> It doesn't handle all the cases that the previous one would, but
> should handle most.

Thanks, comments below.

 - Daniel

--
> diff --git a/include/clang/Driver/OptTable.h b/include/clang/Driver/OptTable.h
> index e4a2eba..6431b81 100644
> --- a/include/clang/Driver/OptTable.h
> +++ b/include/clang/Driver/OptTable.h
> @@ -85,6 +85,10 @@ namespace options {
>
>      Option *CreateOption(unsigned id) const;
>
> +    void ParseAtArg(InputArgList &Args, const char *Str) const;
> +
> +    void ParseArgBuf(InputArgList &Args, const char *Buf) const;
> +
>    protected:
>      OptTable(const Info *_OptionInfos, unsigned _NumOptionInfos);
>    public:
> diff --git a/lib/Driver/OptTable.cpp b/lib/Driver/OptTable.cpp
> index 39530f2..702b81b 100644
> --- a/lib/Driver/OptTable.cpp
> +++ b/lib/Driver/OptTable.cpp
> @@ -12,8 +12,10 @@
>  #include "clang/Driver/ArgList.h"
>  #include "clang/Driver/Option.h"
>  #include "llvm/Support/raw_ostream.h"
> +#include "llvm/ADT/SmallString.h"
>  #include <algorithm>
>  #include <cassert>
> +#include <cstdio>
>  #include <map>
>  using namespace clang::driver;
>  using namespace clang::driver::options;
> @@ -224,6 +226,94 @@ Arg *OptTable::ParseOneArg(const ArgList &Args, unsigned &Index) const {
>    return new Arg(TheUnknownOption, Index++, Str);
>  }
>
> +void OptTable::ParseArgBuf(InputArgList &Args, const char *Buf) const {
> +  const char *P = Buf;
> +  for (;;) {
> +    while (isspace(*P))
> +      ++P;
> +    if (*P == '\0')
> +      return;
> +    bool SingleQuote = false;
> +    bool DoubleQuote = false;
> +    llvm::SmallString<16> CurArg;
> +    for (;;) {
> +      while (!isspace(*P) && *P != '\0' && *P != '"' && *P != '\'' &&
> +             *P != '\\') {
> +        CurArg.push_back(*P);
> +        ++P;
> +      }
> +      if (isspace(*P)) {
> +        if (DoubleQuote || SingleQuote)
> +          CurArg.push_back(*P);
> +        else
> +          break;
> +      } else if (*P == '\0') {
> +        break;
> +      } else if (*P == '"') {
> +        if (SingleQuote)
> +          CurArg.push_back(*P);
> +        else
> +          DoubleQuote = !DoubleQuote;
> +      } else if (*P == '\'') {
> +        if (DoubleQuote)
> +          CurArg.push_back(*P);
> +        else
> +          SingleQuote = !SingleQuote;
> +      }  else if (*P == '\\') {
> +        ++P;
> +        CurArg.push_back(*P);
> +      } else {
> +        assert(0 && "Unexpected char");
> +      }
> +      ++P;

This logic is more complicated than it need be, I think (and I'm not
sure I trust it). I believe the following pseudo code is clearer:
--
in_quote = None
current_arg = None
args = []
it = iter(buf)
for c in it:
    # If we are inside an argument...
    if current_arg is not None:
        # Check whether we have reached the end of the argument.
        if ((in_quote and c == in_quote) or
            (not in_quote and c.isspace())):
            args.append(current_arg)
            current_arg = None
            in_quote = None

        # Otherwise, add the (possibly escaped) character.
        else:
            if c == '\\':
                try:
                    current_arg += it.next()
                except StopIteration:
                    raise ValueError,"Trailing backslash in file."
            else:
                # Quotes inside non-quoted arguments are ignored, inexplicably.
                if not in_quote and (c == '"' or c == "'"):
                    continue

                current_arg += c

    # Otherwise, see if we are starting a new quoted argument.
    elif c == '"' or c == "'":
        in_quote = c
        current_arg = ''

    # Otherwise, see if we are starting a space separated argument.
    elif not c.isspace():
        current_arg = c

if current_arg is not None:
    args.append(current_arg)
if in_quote:
    raise ValueError,"Unterminated quote."
--

> +    }
> +    // FIXME: should we support nested @files?
> +    unsigned Index = Args.MakeIndex(CurArg);
> +    Arg *A = ParseOneArg(Args, Index);
> +    // FIXME: how to handle errors? We have a "dummy" Index...
> +    Args.append(A);
> +  }
> +}
> +
> +void OptTable::ParseAtArg(InputArgList &Args, const char *Str) const {
...

Please use llvm::MemoryBuffer instead of cstdio, and you aren't
allowed to use llvm::errs() here -- you either have to add an error
reporting interface or use an llvm_report_error for now.

> +}
> +
>  InputArgList *OptTable::ParseArgs(const char **ArgBegin, const char **ArgEnd,
>                                    unsigned &MissingArgIndex,
>                                    unsigned &MissingArgCount) const {
> @@ -240,6 +330,13 @@ InputArgList *OptTable::ParseArgs(const char **ArgBegin, const char **ArgEnd,
>        continue;
>      }
>
> +    if (Args->getArgString(Index)[0] == '@') {
> +      // FIXME: How to report erros inside the @file?
> +      ParseAtArg(*Args, Args->getArgString(Index));
> +      ++Index;
> +      continue;
> +    }
> +

I think this would be simpler if it used a different approach; just
load the @file into a new vector of arguments, and recurse. This
eliminates a couple of the fixmes, and properly handles multipart
arguments in the @file, and makes sure the indices are valid (which is
required in some other parts of the driver, although I have been
trying to eliminate them). Something like:
--
InputArgList *OptTable::ParseArgs(const char **ArgBegin, const char **ArgEnd,
                                  unsigned &MissingArgIndex,
                                  unsigned &MissingArgCount) const {
  for (const char *A = ArgBegin; A != ArgEnd; ++A) {
    if (A[0] == '@') {
      // Load the arguments from the file.
      std::string Error;
      std::vector<std::string> FileArgs = ReadAtArgFile /*can be a
local static fn*/(A, Error);
      if (!Error.empty()) ... report error ...

      // Create a new argument vector, and recurse.
      std::vector<const char*> ExpandedArgs;
      ExpandedArgs.reserve(ArgEnd - ArgBegin + Arguments.size());
      ExpandedArgs.insert(ExpandedArgs.end(), ArgBegin, A);
      ExpandedArgs.insert(ExpandedArgs.end(), FileArgs.begin(), FileArgs.end());
      ExpandedArgs.insert(ExpandedArgs.end(), A, ArgEnd);
      return ParseArgs(ExpandedArgs.begin(), ExpandedArgs.end(),
MissingArgIndex, MissingArgCount);
    }
  }
--

Ugh, actually its even more complicated because the ArgList needs to
take ownership of the strings. Given this amount of nastiness, and the
minorness of the feature, I'm leaning more towards saying lets just
take the original patch, and someone can do the extra work to move it
into the OptTable if and when someone cares about it being handled by
other OptTable clients.

>      unsigned Prev = Index;
>      Arg *A = ParseOneArg(*Args, Index);
>      assert(Index > Prev && "Parser failed to consume argument.");
> diff --git a/test/Driver/at_file.c b/test/Driver/at_file.c
> new file mode 100644
> index 0000000..507b62a
> --- /dev/null
> +++ b/test/Driver/at_file.c
> @@ -0,0 +1,26 @@
> +// RUN: %clang -E %s @%s.args -o %t.log
> +// RUN: FileCheck --input-file=%t.log %s
> +
> +// CHECK: bar1
> +// CHECK-NEXT: bar2 zed2
> +// CHECK-NEXT: bar3 zed3
> +// CHECK-NEXT: bar4 zed4
> +// CHECK-NEXT: bar5 zed5
> +// CHECK-NEXT: 'bar6 zed6'
> +// CHECK-NEXT: "bar7 zed7"
> +// CHECK-NEXT: foo8bar8zed8
> +// CHECK-NEXT: foo9'bar9'zed9
> +// CHECK-NEXT: foo10"bar10"zed10
> +// CHECK-NEXT: zed11
> +
> +foo1
> +foo2
> +foo3
> +foo4
> +foo5
> +foo6
> +foo7
> +foo8
> +foo9
> +foo10
> +foo11
> diff --git a/test/Driver/at_file.c.args b/test/Driver/at_file.c.args
> new file mode 100644
> index 0000000..f1e7ea2
> --- /dev/null
> +++ b/test/Driver/at_file.c.args
> @@ -0,0 +1,10 @@
> +-Dfoo1=bar1 -Dfoo2="bar2 zed2"
> +-Dfoo3='bar3 zed3'
> +"-Dfoo4=bar4 zed4"
> +'-Dfoo5=bar5 zed5'
> +-Dfoo6="'bar6 zed6'"
> +-Dfoo7='"bar7 zed7"'
> +-Dfoo8=foo8"bar8"zed8
> +-Dfoo9=foo9\'bar9\'zed9
> +-Dfoo10=foo10\"bar10\"zed10
> +-Dfoo11=zed11\

Some additional cases:
"-Dfoo12=a'b"
'-Dfoo12=a"b'
-D foo13
--

>
>>  - Daniel
>
> Cheers,
> --
> Rafael Ávila de Espíndola
>




More information about the cfe-commits mailing list