[PATCH] D48958: [clang][ubsan] Implicit Cast Sanitizer - integer truncation - clang part

Mon Jul 30 10:50:14 PDT 2018

rsmith accepted this revision.
rsmith added a comment.

Only comments on documentation and assertions. Feel free to commit once these are addressed to your satisfaction.

================
Comment at: docs/ReleaseNotes.rst:310
+
+  Just like ``-fsanitize=integer``, these issues are **not** undefined
+  behaviour. But they are not *always* intentional, and are somewhat hard to
----------------
"Just like -fsanitize=integer" -> "Just like other -fsanitize=integer checks", now that this is part of  `-fsanitize=integer`.

================
Comment at: docs/UndefinedBehaviorSanitizer.rst:17
 * Signed integer overflow
+* Problematic Implicit Casts (not UB, but not always intentional)
 * Conversion to, from, or between floating-point types which would
----------------
rsmith wrote:
> Don't use Title Caps here. "Problematic implicit conversions"
I don't think it makes sense to list this here, as it's not undefined behavior, and this is a list of undefined behavior that UBSan catches. (And I think it makes sense from a communication perspective to consider the non-UB checks to be "not part of UBSan but handled by the same infrastructure".)

================
Comment at: docs/UndefinedBehaviorSanitizer.rst:97
+     width, is not equal to the original value before the downcast.
+     This issue may often be caused by an implicit integer conversions.
   -  ``-fsanitize=integer-divide-by-zero``: Integer division by zero.
----------------
I don't think this last sentence adds anything, and it creates the impression that the issue is sometimes caused by something other than implicit integer conversions (which it isn't, as far as I can tell). Maybe just delete the last sentence here?

And instead, something like this would be useful:

"Issues caught by this sanitizer are not undefined behavior, but are often unintentional."

================
Comment at: docs/UndefinedBehaviorSanitizer.rst:136-137
+     overflow happens (signed or unsigned).
+     Both of these two issues are handled by ``-fsanitize=implicit-conversion``
+     group of checks.
   -  ``-fsanitize=unreachable``: If control flow reaches an unreachable
----------------
I don't think that's true (not until you add a sanitizer for signed <-> unsigned conversions that change the value). `4U / -2` produces the unexpected result `0U` rather than the mathematically-correct result `-2`, and `-fsanitize=implicit-conversion` doesn't catch it. Maybe just strike this sentence for now?

In fact... I think this is too much text to be adding to this bulleted list, which is just supposed to summarize the available checks. Maybe replace the description with

    Signed integer overflow, where the result of a signed integer computation cannot be represented in its type. This includes all the checks covered by ``-ftrapv``, as well as checks for signed division overflow (``INT_MIN/-1``), but not checks for lossy implicit conversions performed before the computation (see ``-fsanitize=implicit-conversion``).

================
Comment at: docs/UndefinedBehaviorSanitizer.rst:140-148
   -  ``-fsanitize=unsigned-integer-overflow``: Unsigned integer
      overflows. Note that unlike signed integer overflow, unsigned integer
      is not undefined behavior. However, while it has well-defined semantics,
-     it is often unintentional, so UBSan offers to catch it.
+     it is often unintentional, so UBSan offers to catch it. Note that this
+     check does not diagnose lossy implicit integer conversions. Also note that
+     integer conversions may result in an unexpected computation result, even
+     though no overflow happens (signed or unsigned).
----------------
And here something like:

    Unsigned integer overflow, where the result of an unsigned integer computation cannot be represented in its type. Unlike signed integer overflow, this is not undefined behavior, but it is often unintentional. This sanitizer does not check for lossy implicit conversions performed before such a computation (see ``-fsanitize=implicit-conversion``).

================
Comment at: docs/UndefinedBehaviorSanitizer.rst:165
      behavior (e.g. unsigned integer overflow).
+     Enables ``-fsanitize=implicit-integer-truncation``.
+  -  ``-fsanitize=implicit-conversion``: Checks for suspicious behaviours of
----------------
If we're going to list which sanitizers are enabled here, we should list them all:

    Enables ``signed-integer-overflow``, ``unsigned-integer-overflow``, ``shift``, ``integer-divide-by-zero``, and ``implicit-integer-truncation``.

================
Comment at: docs/UndefinedBehaviorSanitizer.rst:95
+     of bigger bit width to smaller bit width, if that results in data loss.
+     That is, if the demoted value, after casting back to the original width,
+     is not equal to the original value before the downcast. This issue may
----------------
lebedev.ri wrote:
> rsmith wrote:
> > rsmith wrote:
> > > erichkeane wrote:
> > > > I think the last 2 commas in this sentence are unnecessary?  
> > > I would parenthesize the "where the demoted value [...] would have a different value from the original" clause, since it's just explaining what we mean by "data loss".
> > Is this really the right rule, though? Consider:
> > 
> > ```
> > unsigned int x = 0x81234567;
> > int y = x; // does the sanitizer catch this or not?
> > ```
> > 
> > Here, the value of `x` is not the same as the value of `y` (assuming 32-bit int): `y` is negative. But this is not "data loss" according to the documented meaning of the sanitizer. I think we should produce a sanitizer trap on this case.
> I've reverted this to my original text.
> It should now convey the correct idea, but i'm not sure this is correct English.
> 
> > unsigned int x = 0x81234567;
> > int y = x; // does the sanitizer catch this or not?
> No, it does not. It indeed should. I do plan on following-up with that,
> thus i've adding the group (``-fsanitize=implicit-conversion``), not just one check.
bigger -> larger

... would read a bit better.

================
Comment at: lib/CodeGen/CGExprScalar.cpp:960
+      isa<llvm::IntegerType>(SrcTy) && isa<llvm::IntegerType>(DstTy) &&
+      "if these aren't integer types, then how would trunc and friends work?");
+
----------------
I think it's generally better for the text in an assertion to describe the violated assumption directly:

"clang integer type lowered to non-integer llvm type"

================
Comment at: lib/CodeGen/CGExprScalar.cpp:968
+
+  assert(!SrcType->isBooleanType() && !DstType->isBooleanType() &&
+         "we should not get here with booleans.");
----------------
I think you should only check `DstType` here. The point of the assert is that there is no such thing as a truncation to `bool` (conversion from integer to `bool` is a comparison against `0`, and if we get here for such a case, there's a bug elsewhere). Other than that, `bool` is a perfectly-normal 1-bit unsigned integer type, and doesn't need to be treated as a special case.

Repository:
  rC Clang

https://reviews.llvm.org/D48958