[PATCH] D60389: FileCheck [9/12]: Add support for matching formats

Mon Jan 6 07:01:33 PST 2020

arichardson accepted this revision.
arichardson added a comment.

LGTM. Thanks for working on this feature!

================
Comment at: llvm/lib/Support/FileCheck.cpp:338
+  // Parse format specifier.
+  size_t FormatSpecEnd = Expr.find(',');
+  if (FormatSpecEnd != StringRef::npos) {
----------------
thopre wrote:
> arichardson wrote:
> > thopre wrote:
> > > arichardson wrote:
> > > > Instead of checking for a comma (which be allowed to appear after the ` :` in the future, I would check if the next non-whitespace character is a `%`. Or to simplify this we could require the % to immediately follow the # character?
> > > I'm not sure I follow the intent of your message, are you against the syntax (i.e. there should be no comma at all) or the way it is parsed? Anyway, since the format comes first, I'm not sure how allowing a comma in the later part of the syntax would be a problem. If I remember well its use in the syntax was suggested to match the API of printf/scanf. Otherwise we'd have:
> > > [[#%d FOOBAR:N+1]] which I find less easy to read.
> > Sorry about the ambiguity. The current syntax is perfectly fine.
> > 
> > I was suggesting a change to the parsing code to check that the first non-whitespace char is a `%` first instead of searching for a comma. But this would only change the which error messages for certain invalid input so it shouldn't really matter.
> > 
> > 
> Yes, in one case one could diagnose a missing comma while in the other a missing percentage. This approach allows to reorder the code to parse each element of a numeric substitution (format specifier, variable definition, expression) when internal API change, as was done in earlier version of this patch. If you don't mind I'll keep it this way.
Yes that seems perfectly fine.

================
Comment at: llvm/test/FileCheck/numeric-defines.txt:19
 RUN: %ProtectFileCheckOutput \
-RUN: not FileCheck -D#NUMVAL1=7 -D#NUMVAL2=12 -check-prefix NUMNOT -input-file %s %s 2>&1 \
-RUN:   | FileCheck %s --strict-whitespace -check-prefixes NOT-NUMERRMSG2
-RUN: FileCheck -D#NUMVAL1=7 -D#NUMVAL2=8 -check-prefixes NUMNOT -input-file %s %s
+RUN: not FileCheck -D#%X,NUMVAL1=7 -D#%X,NUMVAL2=12 --check-prefix NUMNOT --input-file %s %s 2>&1 \
+RUN:   | FileCheck %s --strict-whitespace --check-prefixes NOT-NUMERRMSG2
----------------
jhenderson wrote:
> thopre wrote:
> > jhenderson wrote:
> > > thopre wrote:
> > > > arichardson wrote:
> > > > > thopre wrote:
> > > > > > arichardson wrote:
> > > > > > > I find this slightly surprising. If I define a numeric variable on the command line with hex format, I would expect the value to be parsed as a hex number, i.e. 0x12 and not 0xc.
> > > > > > I didn't think of that. What would be your expectation for implicit format though?
> > > > > > 
> > > > > > E.g. -D#%X,NUMVAL1=7 -D#NUMVAL2=12+NUMVAL1. Should NUMVAL2 be 0x17 or decimal 17?
> > > > > I'm not entirely sure what the best solution there is. 
> > > > > I think NUMVAL2 should always evaluate to 19.
> > > > > However, I don't have a strong preference whether it should capture decimal 19 or hex 0x13 (i.e. parse the number in the expresssion as decimal, but inherit format from the other variable).
> > > > > 
> > > > > Alternatively we could require an explicit format for those (probably rare?) cases.
> > > > Is the way you expect 12 to be interpreted in '-D#%X,NUMVAL=12' and in '-D#%X,NUMVAL1=7 -D#NUMVAL2=12+NUMVAL1' different because of the explicit Vs implicit format or because you see the former as similar to a value in an input text (and thus 12 as 0x12) and the latter as a literal in a numeric expression (and thus 12 as decimal 12)?
> > > > 
> > > > I find it even more confusing to distinguish the behavior of literals between implicit and explicit format (e.g. '-D#%X,NUMVAL1=7 -D#%X,NUMVAL2=12+NUMVAL1'  would interpret 12 as 0x12 then).
> > > > 
> > > > I also see several reasons to avoid the distinction between the 2 examples based on whether a variable is present after the equal sign:
> > > > 
> > > >   - complexity: (i) more text needs to be added in the documentation to describe this distinction thus making the feature harder to comprehend IMHO and (ii) the parsing code needs to distinguish between these 2 cases
> > > >   - consistency: the same syntax is used in both cases but the behaviour is different due to the use of a numeric variable after the equal sign
> > > >   - unnecessary: the format specifier is needed because (i) the input text can contain a mixture of text and numeric values and (ii) hex numeric values can come with or without prefix, in lowercase or uppercase. It is thus necessary to indicate when something like "dead" is to be interpreted as a numeric value and when it shouldn't. There is not problem in a numeric substitution (whether one defined on the command line or in a check file) since there is no text in it and it is under control of the test writer which can add a prefix.
> > > > 
> > > > Therefore I strongly think anything after the equal sign is to be interpreted as a numeric expression and not an input. The distinction might appear clearer in the provision made by one of the later patch to support numeric subsitutions such as [[#%X,NUMVAL:<12]] where the numeric expression value would be 12 and the input value could be 11 (input being B) and thus match.
> > > > 
> > > > I've added an example in the documentation to document that a literal is always intepreted as decimal in the absence of prefix (0x12 is allowed and interpreted as hex). Hopefully that'll alleviate some of your concern.
> > > FWIW, I would find treating any number without 0x as hex to be ambiguous. I don't think the format specifier really should make a difference to that. After all, if you think of it in printf terms: `printf("0x%x", 12)` prints "0xc", not "0x12".
> > @jhenderson  I presume you were answering to Paul? Because FileCheck will indeed check the input file for 0xC when encountering CHECK: 0x[[#%X, 12]] so it's all good. As to the prefix, I supposed you meant for the literals inside the [[#]] syntax since prefix is not needed for printf/scanf: scanf("%X", x) will happily scan CAFE without expecting a prefix.
> > 
> > @probinson Since we are talking about parallels between variable definition and scanf on one side and numeric substitution and printf on the other side, we can look at the following examples:
> > 
> >   - 0x[[#%X, VAR2:]] is equivalent to: scanf("%X", VAR2).
> >   - 0x[[#%X, VAR1+12]] when VAR1 was defined with -D#VAR1=3 will be equivalent to checking the output of the following against the input file:
> > 
> > 
> > ```
> > VAR1=3;
> > printf("0x%X", VAR1+12);
> > ```
> > 
> > Note that C interprets 12 in decimal despite the %X format specifier of the printf which only influences the conversion of the resulting numeric value (3+12=15 is converted to F and thus the input file is checked for "0xF").
> > 
> > 
> > [[#SOMEVAR:<EXPR>]] is a shorthand for both, and thus a combination of printf+scanf with some input file check inbetween, e.g. 0x[[#%X, VAR2:VAR1+12]] when VAR1 is defined with -D#VAR1=3 is equivalent to checking the output of the following against the input file:
> > 
> > 
> > ```
> > VAR1=3;
> > printf("0x%X", VAR1+12);
> > ```
> > 
> > and scanning the matched input with:
> > 
> > 
> > ```
> > scanf("0x%X", VAR2);
> > ```
> > 
> > The case of [[#%X,VAR:12]] is then just a special case, i.e. matching the input file against:
> > 
> > 
> > ```
> > printf("%X", 12);
> > ```
> > 
> > followed by scanning the matched text with:
> > 
> > 
> > ```
> > scanf("%X", VAR);
> > ```
> @thopre - I assume you meant @arichardson here, not @probinson...
> 
> I was responding to the general conversation, and giving my thoughts, so not addressing anybody specifically. Yes, the as-it-stands behaviour is what I would prefer, precisely for the reasons you outlined with the printf semantics.
@thopre Thanks for the detailed explanation. I think the current behaviour in the tests makes sense. Having the match format not affect the parsing of whats to the right of the `=` seems simpler and avoids problems.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60389/new/

https://reviews.llvm.org/D60389