[PATCH] D50250: [clang][ubsan] Implicit Conversion Sanitizer - integer sign change - clang part

Sat Aug 4 11:27:26 PDT 2018

lebedev.ri added inline comments.

================
Comment at: lib/CodeGen/CGExprScalar.cpp:1036
+    return;
+  // That's it. We can't rule out any more cases with the data we have.
+
----------------
rsmith wrote:
> rsmith wrote:
> > lebedev.ri wrote:
> > > rsmith wrote:
> > > > I don't like the overlap between the implicit truncation check and this check. I think you should aim for exactly one of those checks to fire for any given integer conversion. There are the following cases:
> > > > 
> > > >  * Dst is smaller than Src: if the value changes at all (with sign change or without), then the truncation check already catches it, and catching it here does not seem useful
> > > >  * Dst is the same size as Src or larger: sign change is the only problem, and is only possible if exactly one of Src and Dst is signed
> > > > 
> > > > So I think you should bail out of this function if either Src and Dst are both unsigned or both are signed, and also if Src is larger than Dst (because we treat that case as a lossy truncation rather than as a sign change).
> > > > 
> > > > And when you do emit a check here, the only thing you need to check is if the signed value is negative (if so, you definitely changed the sign, and if not, you definitely didn't -- except in the truncation cases that the truncation sanitizer catches).
> > > To be clear: we want to skip emitting in those cases if the other check (truncation) is enabled, right?
> > > It does seem to make sense, (and i did thought about that a bit), but i need to think about it more..
> > I think we want to skip emitting those checks always (regardless of whether the other sanitizer is enabled). One way to think of it: this sanitizer checks for non-truncating implicit integer conversions that change the value of the result. The other sanitizer checks for truncating implicit integer conversions that change the value of the result.
> > 
> > I don't see any point in allowing the user to ask to sanitize sign-changing truncation but not other value-changing truncations. That would lead to this:
> > ```
> > int a = 0x17fffffff; // no sanitizer warning
> > int b = 0x180000000; // sanitizer warning
> > int c = 0x1ffffffff; // sanitizer warning
> > int d = 0x200000000; // no sanitizer warning
> > ```
> > ... which I think makes no sense.
> Hmm, wait, the "truncation" sanitizer doesn't catch this:
> 
> `int a = 0x80000000u;`
> 
> ... does it? (Because it looks for cases where the value doesn't round-trip, not for cases where the value was changed by the truncation.)
> 
> 
> I've thought a bit more about the user model and use cases for these sanitizers, and I think what we want is:
> 
>  * a sanitizer that checks for implicit conversions with data loss (the existing truncation sanitizer)
>  * a sanitizer that checks for implicit conversions that change the value, where either the source or destination was signed (approximately what this sanitizer is doing)
> 
> The difference between that and what you have here is that I think the new sanitizer should catch all of these cases:
> 
> ```
> int a = 0x17fffffff;
> int b = 0x180000000;
> int c = 0x1ffffffff;
> int d = 0x200000000;
> ```
> 
> ... because while the initializations of `a` and `d` don't change the sign of the result, that's only because they wrap around *past* a sign change.
> 
> So, I think what you have here is fine for the SrcBits <= DstBits case, but for the SrcBits > DstBits case, you should also check whether the value is the same as the original (that is, perform the truncation check).
> 
> In order to avoid duplicating work when both sanitizers are enabled, it'd make sense to combine the two sanitizer functions into a single function and reuse the checks.
Yep, makes sense. I don't think i have followed the recommendations to the letter,
but i think the end result is not worse than suggested. Added tests shows how it works now.

================
Comment at: lib/CodeGen/CGExprScalar.cpp:1050-1051
+    // NOTE: if it is unsigned, then the comparison is naturally always 'false'.
+    llvm::ICmpInst::Predicate Pred =
+        VSigned ? llvm::ICmpInst::ICMP_SLT : llvm::ICmpInst::ICMP_ULT;
+    // Get the zero of the same type with which we will be comparing.
----------------
rsmith wrote:
> lebedev.ri wrote:
> > rsmith wrote:
> > > If `!VSigned`, the result is a constant `false`; you don't need to emit an `icmp` to work that out.
> > Ok, if you insist.
> > I didn't do that in the first place because we will now have an `icmp`
> > where one operand being a constant, so we can simplify it further.
> > And i don't want to complicate this logic if middle-end already handles it :)
> This becomes a lot simpler with the approach I described in the other comment thread, because you don't need a second `icmp eq` at all.
Humm. So i have initially did this. It is probably broken for non-scalars, but we don't care probably.

But then i thought more.

If we do not emit truncation check, we get `icmp eq (icmp ...), false`, which is tautological.
We can't just drop the outer `icmp eq` since we'd get [[ https://rise4fun.com/Alive/4slv | the opposite value ]].
We could emit `xor %icmp, -1` to invert it.  Or simply invert the predicate, and avoid the second `icmp`.
By itself, either of these options doesn't sound that bad.

But if both are signed, we can't do that. So we have to have two different code paths...

If we do emit the `icmp ult %x, 0`, [it naturally works with vectors], we avoid complicating the front-end,
and the middle-end playfully simplifies this IR with no sweat.

So why do we want to complicate the front-end //in this case//, and not let the middle-end do it's job?
I'm unconvinced, and i have kept this as is. :/

Repository:
  rC Clang

https://reviews.llvm.org/D50250