[clang] [lldb] [Clang] Introduce OverflowBehaviorType for fine-grained overflow control (PR #148914)

Wed Sep 24 13:50:25 PDT 2025

ojhunt wrote:

This comment became quite long, so I just want to add a preamble. First off, I'm sorry that I missed the RFC for this or I would have raised these objections then - If I did see it I would have blindly assumed that it was solely removing the UB nature of overflow, with no other semantic changes, and so probably not paid it any heed. This is a feature I have wanted for a long time, but the narrowing behavior is just not ok - I would have to essentially make this a banned feature in any code base I have influence over, for the reasons I'll be discussing below.

I’m also not sure I understand the semantics of narrowing when `__nowrap` is used, I’ve tried to  - given

```cpp
uint64_t a = ...
uint32_t __no_wrap b = ...
auto result = a + b
```

The documentation makes it clear that `smaller_type __no_wrap a = larger_type` is an error, but what happens in the case of `(smaller_type __no_wrap) + larger_type` - is that an error?

I could dump these issues into the RFC thread but I don’t think that’s necessarily appropriate - @aaronballman or 

Anyway, on to the wall of text comment (as people who are on WG21 can attest I have an unerring ability to do this, despite my best efforts):

The description of this PR states that this feature is fine grained control of fwrapv, etc that make wrapping have defined semantics rather than undefined. But the change in semantics here is not that, it is a significantly different behaviour, and has the effect of making previously correct code to incorrect code. I.e code that is correct, and has no UB (e.g. no overflow), is changed to be incorrect - introducing both overflow and truncation into code that previously had none.

A common refrain in the replies to my comments has been this is a new feature that is opt in, but the problem is there are many cases developers would want to opt in to these in ways that would be relatively global. It's super easy to imagine a project doing:

```cpp
typedef uint8_t __nowrap uint8_safe_t;
typedef uint16_t __nowrap uint16_safe_t;
typedef uint32_t __nowrap uint32_safe_t;
typedef uint63_t __nowrap uint63_safe_t;
typedef uint8_t __wrap uint8_wrap_t;
typedef uint16_t __wrap uint16_wrap_t;
typedef uint32_t __wrap uint32_wrap_t;
typedef uint63_t __wrap uint63_wrap_t;
```

Then adding a linter rule that requires all new code to use one these instead of the existing types. In codebases I have working in, this is _exactly_ what they would want to do. But the narrowing change makes such a change impossible - interaction between new and existing code, or system APIs, means the narrowing can happen anywhere, and similarly literals like `1ull + somevalue` can now break as well.

These changes in semantics can introduce security vulnerabilities where previously none existed. Take the format string example in the RFC, which was called out as a result of this change in behavior, which is the reason I assumed it had be removed:

```cpp
unsigned long long a;
size_t __wrap b;

printf("%llx", a + b);
```

on systems where `sizeof(int) != sizeof(unsigned long long)` this previously _correct_ code is now a security vulnerability. In principle this could happen on a 64bit system if varargs aren’t required to be at least 8 byte aligned, though I’m not sure if such systems exist.

You also get silent truncation in previously correct code:

```cpp
struct S32 {
  uint32_t __wrap a; // I want defined semantics for arithmetic on this field
                     // e.g. no UB -> compiler created security vulnerability
};
struct S64 {
  uint64_t __wrap a;
};

uint64_t doSomething();
uint64_t foo(S32 s) {
   return doSomething() + s.a; // silently truncate to uint32_t, and silently expand out to uint64_t;
}
```

You can imagine code this also impacting code like:

```cpp
auto /*uint32_t*/ base = getBase();
auto /*uint<less than 32>_t __wrap*/ offset = ...; // so many formats and use cases have bytes here
if (offset >= size)
  return;
buffer[base + offset] = ...; // you're now writing out of expected bounds
```

I also suspect these semantics mean that (assuming the above `uint64_t doSomething();` exists)

```cpp
uint64_t foo(uint32_t __wrap a, uint64_t wrap b) {
   return doSomething() + a + b;
}
```

could now produce a different result (it's already different from what existing correct code does) from

```cpp
uint64_t foo(uint32_t __wrap a, uint64_t wrap b) {
   return doSomething() + b + a;
}
```

Due to the narrowing of `doSomething() + a` where previously the developer had (presumably) ensured a correct result. Note that this is distinct from `int32 + int32 + int64` vs `int64 + int32 + int32` which is a real and existing problem, because we’re starting with correct code.

Similar to this, previously correct C++ becomes wrong, and changes substantially:

```cpp

// just some nonsense to demonstrate the viral nature of this change in narrowing
template <class T> struct S {
   S(T a) : a(a) {}
   T a;
}

template <class A, class B> auto doSomething(S<A> s1, S<B> s2) -> S<decltype(s1.a + s2.b)> {
   return S(s1.a + s2.a);
}

```

`doSomething` now loses data if one of A or B is a smaller `__wrap` type (e.g. `doSomething(S<uint64_t>(...), S<uint32_t __wrap>(...))`), and while historically this might have been caught by something like

```
S<uint64_t> s = doSomething(...)
```

Modern C++ has `auto` and `S s = doSomething(...)` which will silently accept the change from `S<uint64_t>` to `S<uint32_t __wrap>`, and the viral nature will continue to silently propagate the `uint32_t __wrap` through everything.

I think `_Auto` or whatever arbitrarily different spelling C choose for `auto` has similar hazards.

The reality is that given the change to narrowing behavior this attribute does not actually solve the real world problems I’ve seen, or that I have had developers express a desire for support with, and it makes existing correct code incorrect if adopted, to the extent that it can create _new_ vulnerabilities rather than removing them. Given the above issues, not only would this not be a solution to what developers have asked for, it would likely be necessary to _forbid_ the adoption of the `__wrap` portion of the feature entirely, and `__no_wrap` would at least be challenging due to the potential introduction of crashes in code that is error (which includes UB) free.

That said, these are just the issues that I see from my point of view - the RFC says that this is intended to support linux kernel use cases, so I’d like to know whether these issues and hazards seem like a problem to them.

https://github.com/llvm/llvm-project/pull/148914