[PATCH] D49114: [clang-tidy] Add a check for "magic numbers"

Sun Jul 29 08:27:36 PDT 2018

0x8000-0000 added inline comments.

================
Comment at: clang-tidy/readability/MagicNumbersCheck.cpp:57
+const char DefaultIgnoredIntegerValues[] = "0;1;";
+const char DefaultIgnoredFloatingPointValues[] = "0.0;";
+
----------------
aaron.ballman wrote:
> 0x8000-0000 wrote:
> > aaron.ballman wrote:
> > > I would still like to see some data on common floating-point literal values used in large open source project so that we can see what sensible values should be in this list.
> > What value would that bring? The ideal target is that there are no magic values - no guideline that I have seen makes exception for 3.141 or 9.81. Each project is special based on how they evolved, and they need to decide for themselves what is worth cleaning vs what can be swept under the rug for now. Why would we lend authority to any particular floating point value?
> Because that's too high of a high false positive rate for an acceptable clang-tidy check. As mentioned before, there are literally hundreds of unnameable floating-point literals in LLVM alone where the value is 1.0 or 2.0. Having statistical data to pick sensible defaults for this list is valuable in that it lowers the false positive rate. If the user dislikes the default list for some reason (because for their project, maybe 2.0 is a supremely nameable literal value), they can pick a different set of defaults.
> 
> Right now, I'm operating off an assumption that most floating-point literals that should not be named are going to be whole numbers that are precisely represented in all floating-point semantic models. This data will tell us if that assumption is wrong, and if the assumption is wrong, we might want to go with separate lists like you've done.
Here are the results with the check as-is, run on the llvm code base as of last night:

top-40
```
  10435 2
   5543 4
   4629 8
   3271 3
   2702 16
   1876 32
   1324 64
   1309 10
   1207 5
   1116 128
    966 6
    733 7
    575 256
    421 20
    406 12
    339 9
    331 1024
    311 100
    281 42
    253 11
    226 15
    189 40
    172 24
    169 0xff
    168 13
    168 0x80
    166 512
    137 1.0
    133 14
    132 31
    129 0xDEADBEEF
    120 18
    120 17
    120 1000
    115 4096
    100 30
     94 60
     94 0x1234
     89 0x20
     86 0xFF
```

1.0 is in position 28 with 137 occurrences
2.0 is in position 93 with 27 occurrences
100.0 is in position 96 with 26 occurences
1.0f is in position 182 with 11 occurences

we also have 2.0e0 four times :)

This data suggests that there would be value in a IgnorePowerOf2IntegerLiterals option.

================
Comment at: clang-tidy/readability/MagicNumbersCheck.cpp:76-86
+  IgnoredFloatingPointValues.reserve(IgnoredFloatingPointValuesInput.size());
+  IgnoredDoublePointValues.reserve(IgnoredFloatingPointValuesInput.size());
+  for (const auto &InputValue : IgnoredFloatingPointValuesInput) {
+    llvm::APFloat FloatValue(llvm::APFloat::IEEEsingle());
+    FloatValue.convertFromString(InputValue, DefaultRoundingMode);
+    IgnoredFloatingPointValues.push_back(FloatValue.convertToFloat());
+
----------------
aaron.ballman wrote:
> 0x8000-0000 wrote:
> > aaron.ballman wrote:
> > > 0x8000-0000 wrote:
> > > > aaron.ballman wrote:
> > > > > This is where I would construct an `APFloat` object from the string given. As for the semantics to be used, I would recommend getting it from `TargetInfo::getDoubleFormat()` on the belief that we aren't going to care about precision (explained in the documentation).
> > > > Here is the problem I tried to explain last night but perhaps I wasn't clear enough.
> > > > 
> > > > When we parse the input list from strings, we have to commit to one floating point value "semantic" - in our case single or double precision.
> > > > 
> > > > When we encounter the value in the source code and it is captured by a matcher, it comes as either one of those values.
> > > > 
> > > > Floats with different semantics can't be directly compared - so we have to maintain two distinct arrays.
> > > > 
> > > > If we do that, rather than store APFloats and sort/compare them with awkward lambdas, we might as well just use the native float/double and be done with it more cleanly.
> > > >When we encounter the value in the source code and it is captured by a matcher, it comes as either one of those values.
> > > 
> > > It may also come in as long double or __float128, for instance, because there are type suffixes for that.
> > > 
> > > > Floats with different semantics can't be directly compared - so we have to maintain two distinct arrays.
> > > 
> > > Yes, floats with different semantics cannot be directly compared. That's why I said below that we should coerce the literal values.
> > > 
> > > > If we do that, rather than store APFloats and sort/compare them with awkward lambdas, we might as well just use the native float/double and be done with it more cleanly.
> > > 
> > > There are too many different floating-point semantics for this to be viable, hence why coercion is a reasonable behavior.
> > Let me see if I understood it - your proposal is: store only doubles, and when a floating-point literal is encountered in code, do not use the FloatingLiteral instance, but parse it again into a double and compare exactly. If the comparison matches - ignore it.
> > 
> > In that case what is the value of storing APFloats with double semantics in the IgnoredValues array, instead of doubles?
> > Let me see if I understood it - your proposal is: store only doubles, and when a floating-point literal is encountered in code, do not use the FloatingLiteral instance, but parse it again into a double and compare exactly. If the comparison matches - ignore it.
> 
> My proposal is to use `APFloat` as the storage and comparison medium. Read in strings from the configuration and convert them to an `APFloat` that has double semantics. Read in literals and call `FloatLiteral::getValue()` to get the `APFloat` from it, convert it to one that has double semantics as needed, then perform the comparison between those two `APFloat` objects.
> 
> > In that case what is the value of storing APFloats with double semantics in the IgnoredValues array, instead of doubles?
> 
> Mostly that it allows us to modify or extend the check for more complicated semantics in the future. Also, it's good practice to use something with consistent semantic behavior across hosts and targets (comparisons between numbers that cannot be precisely represented will at least be consistently compared across hosts when compiling for the same target).
> 
ok - coming right up!

Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D49114