[PATCH] D48958: [clang][ubsan] Implicit Cast Sanitizer - integer truncation - clang part
Richard Smith - zygoloid via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Thu Jul 26 16:32:12 PDT 2018
rsmith added inline comments.
================
Comment at: docs/ReleaseNotes.rst:49-51
+- A new Implicit Cast Sanitizer (``-fsanitize=implicit-cast``) group was added.
+ Please refer to the :ref:`release-notes-ubsan` section of the release notes
+ for the details.
----------------
Regarding the name of this sanitizer: C and C++ refer to these as "implicit conversions" not "implicit casts", and "implicit cast" is a contradiction in terms -- a cast is explicit syntax for performing a conversion. We should use the external terminology here ("implicit-conversion") rather than the slightly-odd clang-specific convention of calling an implicit conversion an "implicit cast".
================
Comment at: docs/ReleaseNotes.rst:290
+ void test(unsigned long val) {
+ if (consume(val)) // the value got silently truncated.
+ store = store + 768; // before addition, 'store' was promoted to int.
----------------
got -> may have been
================
Comment at: docs/ReleaseNotes.rst:292
+ store = store + 768; // before addition, 'store' was promoted to int.
+ (void)consume((unsigned int)val); // OK, the truncation is implicit.
+ }
----------------
implicit -> explicit
================
Comment at: docs/UndefinedBehaviorSanitizer.rst:95
+ of bigger bit width to smaller bit width, if that results in data loss.
+ That is, if the demoted value, after casting back to the original width,
+ is not equal to the original value before the downcast. This issue may
----------------
erichkeane wrote:
> I think the last 2 commas in this sentence are unnecessary?
I would parenthesize the "where the demoted value [...] would have a different value from the original" clause, since it's just explaining what we mean by "data loss".
================
Comment at: docs/UndefinedBehaviorSanitizer.rst:95
+ of bigger bit width to smaller bit width, if that results in data loss.
+ That is, if the demoted value, after casting back to the original width,
+ is not equal to the original value before the downcast. This issue may
----------------
rsmith wrote:
> erichkeane wrote:
> > I think the last 2 commas in this sentence are unnecessary?
> I would parenthesize the "where the demoted value [...] would have a different value from the original" clause, since it's just explaining what we mean by "data loss".
Is this really the right rule, though? Consider:
```
unsigned int x = 0x81234567;
int y = x; // does the sanitizer catch this or not?
```
Here, the value of `x` is not the same as the value of `y` (assuming 32-bit int): `y` is negative. But this is not "data loss" according to the documented meaning of the sanitizer. I think we should produce a sanitizer trap on this case.
================
Comment at: docs/UndefinedBehaviorSanitizer.rst:17
* Signed integer overflow
+* Problematic Implicit Casts (not UB, but not always intentional)
* Conversion to, from, or between floating-point types which would
----------------
Don't use Title Caps here. "Problematic implicit conversions"
================
Comment at: docs/UndefinedBehaviorSanitizer.rst:131-133
+ overflow in signed division (``INT_MIN / -1``). Please do note that this
+ check does not diagnose lossy implicit integer conversions. Also please be
+ aware of integer conversions, as those may result in an unexpected
----------------
Remove the "Please"s here. We don't need to beg the reader to read the rest of the sentence. Just "Note that this [...]. Also note that integer conversions may result in an unexpected computation result, [...]"
================
Comment at: docs/UndefinedBehaviorSanitizer.rst:142-147
+ it is often unintentional, so UBSan offers to catch it. Please do note that
+ this check does not diagnose lossy implicit integer conversions. Also
+ please be aware of integer conversions, as those may result in an
+ unexpected computation results, even though no overflow happens (signed or
+ unsigned). Both of these two issues are handled by ``-fsanitize=implicit-cast``
+ group of checks.
----------------
Likewise here.
================
Comment at: include/clang/Basic/Sanitizers.def:134-136
SANITIZER_GROUP("integer", Integer,
SignedIntegerOverflow | UnsignedIntegerOverflow | Shift |
IntegerDivideByZero)
----------------
`-fsanitize=integer` should include `-fsanitize=implicit-integer-truncation`.
================
Comment at: lib/CodeGen/CGExprScalar.cpp:954-955
+ // We only care about int->int casts here, and ignore casts to/from pointer.
+ if (!(isa<llvm::IntegerType>(SrcTy) && isa<llvm::IntegerType>(DstTy)))
+ return;
+ // We do not care about booleans.
----------------
Check the Clang types here, not the LLVM types. There is no guarantee that only integer types get converted to LLVM `IntegerType`s. (But if you like, you can assert that `SrcTy` and `DstTy` are `IntegerType`s after checking that the clang types are both integer types.)
================
Comment at: lib/CodeGen/CGExprScalar.cpp:956-958
+ // We do not care about booleans.
+ if (SrcType->isBooleanType() || DstType->isBooleanType())
+ return;
----------------
I believe this is redundant: we don't get here for an integer to boolean conversion, and a boolean to integer conversion would always be caught by the bit width check below.
================
Comment at: lib/CodeGen/CGExprScalar.cpp:962-964
+ // This must be truncation. Else we do not care.
+ if (SrcBits <= DstBits)
+ return;
----------------
I think you should also catch casts that change signedness in the case if the sign bit is set on the value. (Though if you want to defer this to a follow-up change and maybe give the sanitizer a different name, that's fine with me.)
Repository:
rC Clang
https://reviews.llvm.org/D48958
More information about the cfe-commits
mailing list