[PATCH] D60389: FileCheck [9/12]: Add support for matching formats

Thomas Preud'homme via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Jan 6 05:41:35 PST 2020


thopre marked 4 inline comments as done.
thopre added inline comments.


================
Comment at: llvm/test/FileCheck/numeric-defines.txt:19
 RUN: %ProtectFileCheckOutput \
-RUN: not FileCheck -D#NUMVAL1=7 -D#NUMVAL2=12 -check-prefix NUMNOT -input-file %s %s 2>&1 \
-RUN:   | FileCheck %s --strict-whitespace -check-prefixes NOT-NUMERRMSG2
-RUN: FileCheck -D#NUMVAL1=7 -D#NUMVAL2=8 -check-prefixes NUMNOT -input-file %s %s
+RUN: not FileCheck -D#%X,NUMVAL1=7 -D#%X,NUMVAL2=12 --check-prefix NUMNOT --input-file %s %s 2>&1 \
+RUN:   | FileCheck %s --strict-whitespace --check-prefixes NOT-NUMERRMSG2
----------------
jhenderson wrote:
> thopre wrote:
> > arichardson wrote:
> > > thopre wrote:
> > > > arichardson wrote:
> > > > > I find this slightly surprising. If I define a numeric variable on the command line with hex format, I would expect the value to be parsed as a hex number, i.e. 0x12 and not 0xc.
> > > > I didn't think of that. What would be your expectation for implicit format though?
> > > > 
> > > > E.g. -D#%X,NUMVAL1=7 -D#NUMVAL2=12+NUMVAL1. Should NUMVAL2 be 0x17 or decimal 17?
> > > I'm not entirely sure what the best solution there is. 
> > > I think NUMVAL2 should always evaluate to 19.
> > > However, I don't have a strong preference whether it should capture decimal 19 or hex 0x13 (i.e. parse the number in the expresssion as decimal, but inherit format from the other variable).
> > > 
> > > Alternatively we could require an explicit format for those (probably rare?) cases.
> > Is the way you expect 12 to be interpreted in '-D#%X,NUMVAL=12' and in '-D#%X,NUMVAL1=7 -D#NUMVAL2=12+NUMVAL1' different because of the explicit Vs implicit format or because you see the former as similar to a value in an input text (and thus 12 as 0x12) and the latter as a literal in a numeric expression (and thus 12 as decimal 12)?
> > 
> > I find it even more confusing to distinguish the behavior of literals between implicit and explicit format (e.g. '-D#%X,NUMVAL1=7 -D#%X,NUMVAL2=12+NUMVAL1'  would interpret 12 as 0x12 then).
> > 
> > I also see several reasons to avoid the distinction between the 2 examples based on whether a variable is present after the equal sign:
> > 
> >   - complexity: (i) more text needs to be added in the documentation to describe this distinction thus making the feature harder to comprehend IMHO and (ii) the parsing code needs to distinguish between these 2 cases
> >   - consistency: the same syntax is used in both cases but the behaviour is different due to the use of a numeric variable after the equal sign
> >   - unnecessary: the format specifier is needed because (i) the input text can contain a mixture of text and numeric values and (ii) hex numeric values can come with or without prefix, in lowercase or uppercase. It is thus necessary to indicate when something like "dead" is to be interpreted as a numeric value and when it shouldn't. There is not problem in a numeric substitution (whether one defined on the command line or in a check file) since there is no text in it and it is under control of the test writer which can add a prefix.
> > 
> > Therefore I strongly think anything after the equal sign is to be interpreted as a numeric expression and not an input. The distinction might appear clearer in the provision made by one of the later patch to support numeric subsitutions such as [[#%X,NUMVAL:<12]] where the numeric expression value would be 12 and the input value could be 11 (input being B) and thus match.
> > 
> > I've added an example in the documentation to document that a literal is always intepreted as decimal in the absence of prefix (0x12 is allowed and interpreted as hex). Hopefully that'll alleviate some of your concern.
> FWIW, I would find treating any number without 0x as hex to be ambiguous. I don't think the format specifier really should make a difference to that. After all, if you think of it in printf terms: `printf("0x%x", 12)` prints "0xc", not "0x12".
@jhenderson  I presume you were answering to Paul? Because FileCheck will indeed check the input file for 0xC when encountering CHECK: 0x[[#%X, 12]] so it's all good. As to the prefix, I supposed you meant for the literals inside the [[#]] syntax since prefix is not needed for printf/scanf: scanf("%X", x) will happily scan CAFE without expecting a prefix.

@probinson Since we are talking about parallels between variable definition and scanf on one side and numeric substitution and printf on the other side, we can look at the following examples:

  - 0x[[#%X, VAR2:]] is equivalent to: scanf("%X", VAR2).
  - 0x[[#%X, VAR1+12]] when VAR1 was defined with -D#VAR1=3 will be equivalent to checking the output of the following against the input file:


```
VAR1=3;
printf("0x%X", VAR1+12);
```

Note that C interprets 12 in decimal despite the %X format specifier of the printf which only influences the conversion of the resulting numeric value (3+12=15 is converted to F and thus the input file is checked for "0xF").


[[#SOMEVAR:<EXPR>]] is a shorthand for both, and thus a combination of printf+scanf with some input file check inbetween, e.g. 0x[[#%X, VAR2:VAR1+12]] when VAR1 is defined with -D#VAR1=3 is equivalent to checking the output of the following against the input file:


```
VAR1=3;
printf("0x%X", VAR1+12);
```

and scanning the matched input with:


```
scanf("0x%X", VAR2);
```

The case of [[#%X,VAR:12]] is then just a special case, i.e. matching the input file against:


```
printf("%X", 12);
```

followed by scanning the matched text with:


```
scanf("%X", VAR);
```


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60389/new/

https://reviews.llvm.org/D60389





More information about the llvm-commits mailing list