[PATCH] D48958: [clang][ubsan] Implicit Cast Sanitizer - integer truncation - clang part

Richard Smith - zygoloid via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Jul 26 16:32:12 PDT 2018


rsmith added inline comments.


================
Comment at: docs/ReleaseNotes.rst:49-51
+- A new Implicit Cast Sanitizer (``-fsanitize=implicit-cast``) group was added.
+  Please refer to the :ref:`release-notes-ubsan` section of the release notes
+  for the details.
----------------
Regarding the name of this sanitizer: C and C++ refer to these as "implicit conversions" not "implicit casts", and "implicit cast" is a contradiction in terms -- a cast is explicit syntax for performing a conversion. We should use the external terminology here ("implicit-conversion") rather than the slightly-odd clang-specific convention of calling an implicit conversion an "implicit cast".


================
Comment at: docs/ReleaseNotes.rst:290
+      void test(unsigned long val) {
+        if (consume(val)) // the value got silently truncated.
+          store = store + 768; // before addition, 'store' was promoted to int.
----------------
got -> may have been


================
Comment at: docs/ReleaseNotes.rst:292
+          store = store + 768; // before addition, 'store' was promoted to int.
+        (void)consume((unsigned int)val); // OK, the truncation is implicit.
+      }
----------------
implicit -> explicit


================
Comment at: docs/UndefinedBehaviorSanitizer.rst:95
+     of bigger bit width to smaller bit width, if that results in data loss.
+     That is, if the demoted value, after casting back to the original width,
+     is not equal to the original value before the downcast. This issue may
----------------
erichkeane wrote:
> I think the last 2 commas in this sentence are unnecessary?  
I would parenthesize the "where the demoted value [...] would have a different value from the original" clause, since it's just explaining what we mean by "data loss".


================
Comment at: docs/UndefinedBehaviorSanitizer.rst:95
+     of bigger bit width to smaller bit width, if that results in data loss.
+     That is, if the demoted value, after casting back to the original width,
+     is not equal to the original value before the downcast. This issue may
----------------
rsmith wrote:
> erichkeane wrote:
> > I think the last 2 commas in this sentence are unnecessary?  
> I would parenthesize the "where the demoted value [...] would have a different value from the original" clause, since it's just explaining what we mean by "data loss".
Is this really the right rule, though? Consider:

```
unsigned int x = 0x81234567;
int y = x; // does the sanitizer catch this or not?
```

Here, the value of `x` is not the same as the value of `y` (assuming 32-bit int): `y` is negative. But this is not "data loss" according to the documented meaning of the sanitizer. I think we should produce a sanitizer trap on this case.


================
Comment at: docs/UndefinedBehaviorSanitizer.rst:17
 * Signed integer overflow
+* Problematic Implicit Casts (not UB, but not always intentional)
 * Conversion to, from, or between floating-point types which would
----------------
Don't use Title Caps here. "Problematic implicit conversions"


================
Comment at: docs/UndefinedBehaviorSanitizer.rst:131-133
+     overflow in signed division (``INT_MIN / -1``). Please do note that this
+     check does not diagnose lossy implicit integer conversions. Also please be
+     aware of integer conversions, as those may result in an unexpected
----------------
Remove the "Please"s here. We don't need to beg the reader to read the rest of the sentence. Just "Note that this [...]. Also note that integer conversions may result in an unexpected computation result, [...]"


================
Comment at: docs/UndefinedBehaviorSanitizer.rst:142-147
+     it is often unintentional, so UBSan offers to catch it. Please do note that
+     this check does not diagnose lossy implicit integer conversions. Also
+     please be aware of integer conversions, as those may result in an
+     unexpected computation results, even though no overflow happens (signed or
+     unsigned). Both of these two issues are handled by ``-fsanitize=implicit-cast``
+     group of checks.
----------------
Likewise here.


================
Comment at: include/clang/Basic/Sanitizers.def:134-136
 SANITIZER_GROUP("integer", Integer,
                 SignedIntegerOverflow | UnsignedIntegerOverflow | Shift |
                 IntegerDivideByZero)
----------------
`-fsanitize=integer` should include `-fsanitize=implicit-integer-truncation`.


================
Comment at: lib/CodeGen/CGExprScalar.cpp:954-955
+  // We only care about int->int casts here, and ignore casts to/from pointer.
+  if (!(isa<llvm::IntegerType>(SrcTy) && isa<llvm::IntegerType>(DstTy)))
+    return;
+  // We do not care about booleans.
----------------
Check the Clang types here, not the LLVM types. There is no guarantee that only integer types get converted to LLVM `IntegerType`s. (But if you like, you can assert that `SrcTy` and `DstTy` are `IntegerType`s after checking that the clang types are both integer types.)


================
Comment at: lib/CodeGen/CGExprScalar.cpp:956-958
+  // We do not care about booleans.
+  if (SrcType->isBooleanType() || DstType->isBooleanType())
+    return;
----------------
I believe this is redundant: we don't get here for an integer to boolean conversion, and a boolean to integer conversion would always be caught by the bit width check below.


================
Comment at: lib/CodeGen/CGExprScalar.cpp:962-964
+  // This must be truncation. Else we do not care.
+  if (SrcBits <= DstBits)
+    return;
----------------
I think you should also catch casts that change signedness in the case if the sign bit is set on the value. (Though if you want to defer this to a follow-up change and maybe give the sanitizer a different name, that's fine with me.)


Repository:
  rC Clang

https://reviews.llvm.org/D48958





More information about the cfe-commits mailing list