[clang] [analyzer] Handle [[assume(cond)]] as __builtin_assume(cond) (PR #116462)

Vinay Deshmukh via cfe-commits cfe-commits at lists.llvm.org
Thu Nov 21 16:43:39 PST 2024


vinay-deshmukh wrote:

> We should have a look at what the codegen does with this attribute, or even with the builtin assume call - because we should do something very similar. We should gather the constraints and apply them, just like codegen does (I assume).
I think a suitable place to start reading would be CGStmt.cpp CodeGenFunction::EmitAttributedStmt().
> Basically, my idea now is more like: don't eval the expressions of the assume in the CFG, but only visit the AttributedStmt in the ExprEngine. When we would reach that, "do the same as codegen to get the constraints" and apply these constraints.

CodeGen seems to be doing a [NOOP operation when encountering an attribute with side-effects in `void CodeGenFunction::EmitAttributedStmt(const AttributedStmt &S) {`](https://github.com/llvm/llvm-project/blob/ab28d387fafede5d56a005b2597903fe6a2fbf9a/clang/lib/CodeGen/CGStmt.cpp#L756-L765).  This can be mirrored within this PR by "ignore"-ing/(not adding those expressions as children) the assumption attributes (in CFG.cpp) that have side effects.

I've checked the `debug.ViewCFG` output and it looks like this is similar to what `__builtin_assume` does if it's assumptions encounter expressions with  side-effects.

Furthermore, it appears to me that `clang` specifically NEVER evaluates assumptions that can have side-effects (See section at the end that compares GCC/clang behavior). We know it doesn't for codegen as per the above function, and given the `DiagnosticSemaKind.td`'s `warn_assume_side_effects` occurrence unconditionally, it would appear they aren't evaluated at all. (compliant with the standard).

Once the other comments regarding `for` vs `if` is answered, I can raise a cleaned up PR with the necessary changes, that way it can be one commit.

<details>
<summary>
Comparison of GCC vs clang for compiler for pre-C++23 ASSUME macro and `[[assume()]]`

</summary>


Noting the current "real" behavior of `clang` and `gcc` **NOT** considering static analysis/checkers:
https://godbolt.org/z/nM6PccseK

This godbolt link shows the output of with the following scenarios, where the assumption argument is an expression with side-effects (`++y == 43`):

1. with the compiler builtin-assume equivalent (as per each compiler) as:
```c++
// From: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p1774r8.pdf
#if defined(__clang__)
#define ASSUME(expr) __builtin_assume(expr)
#elif defined(__GNUC__) && !defined(__ICC)
#define ASSUME(expr) if (expr) {} else { __builtin_unreachable(); }
#endif
```

a. U.B. (as per standard) i.e. `f(20)` takes `y != 42`

* Clang: returns 20, (**DOES NOT EVALUATE the assumption as it has side-effects**) (via `DiagnosticsSemaKinds.td`)

```
If the condition is violated during execution, the behavior is undefined. The argument itself is never evaluated, so any side effects of the expression will be discarded.
```
from 
https://clang.llvm.org/docs/LanguageExtensions.html#builtin-assume

Note: `int f(int)`'s assembly output contains `20`, as clang didn't evaluate `++y`

* GCC:  returns 139, terminates with SIGSEGV

Note: `int f(int)`'s assembly output contains `43` since GCC evaluates `++y`
```
If control flow reaches the point of the __builtin_unreachable, the program is undefined.
```
from
https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005funreachable

b. Defined Behavior i.e. `f()` takes 42 as input

* Clang: returns 42, asm output has `42`, since clagn does **NOT** evaluate `++y`
* GCC:  returns 43, asm output has `43`, since GCC evaluates `++y`

2. with `[[assume]]` (by setting the preprocessor directive `CXX23_ASSUME` at the start of the code)

a. U.B. (as per standard) i.e. `f(20)` takes `y != 42`
[U.B. so no behavior is guaranteed anyway.](https://eel.is/c++draft/dcl.attr.assume#sentence-5)

* Clang: `f()`'s ASM is reading from the parameter, `main()` returns `20`
* GCC:  `f()` 's ASM has `42`, but `main()` returns `20`

b. Defined Behavior i.e. `f()` takes 42 as input

Clang: `f()`'s ASM is reading from the parameter, `main()` returns `42`
GCC:  `f()`'s ASM has 42, `main()` returns 42





</details>

https://github.com/llvm/llvm-project/pull/116462


More information about the cfe-commits mailing list