[cfe-dev] Pointers as SVals

Ádám Balogh via cfe-dev cfe-dev at lists.llvm.org
Fri Jun 19 03:19:28 PDT 2020


Hello,

Thank you for your very detailed answer. The main point what was not clear before was that we never store a pointer in a symbolic value (`$p`) but in a memory region which points to the symbolic region instead (`&SymRegion{$p}`).

Regards,

Ádám

From: Artem Dergachev <noqnoqneo at gmail.com>
Sent: 2020. június 18., csütörtök 13:21
To: Ádám Balogh <adam.balogh at ericsson.com>; 'cfe-dev at lists.llvm.org' <cfe-dev at lists.llvm.org>
Subject: Re: [cfe-dev] Pointers as SVals

If a symbol (SymExpr object) `$p` is an unknown numeric value of a memory address, then a symbolic region (i.e., SymbolicRegion object) `SymRegion{$p}` represents the segment of memory that starts at address $p and ends at another unknown position, and a pointer value (loc::MemRegionVal object) `&SymRegion{$p}` represents, well, a value of a pointer to the beginning of symbolic region `SymRegion{$p}`.

All three are basically the same thing. `SymRegion{$p}` is slightly different because it implies the existence of the other end of the segment (even if it's unknown) but `&SymRegion{$p}` is basically the same thing as `$p`, just represented as an object of a different type (SVal as opposed to SymExpr).

Think of SymbolicRegion and loc::MemRegionVal as adaptors; they don't change the meaning behind the object, they only represent it in a different manner, like a different point of view on the same entity. The important technical difference between `&SymRegion{$p}` and `$p` is that the former is Loc and the latter is NonLoc.

There's another such adaptor, nonloc::SymbolVal, that represents SymExprs as SVals directly. For any symbol `$p` of pointer type, nonloc::SymbolVal of `$p` is ill-formed; it is always going to be canonically represented as loc::MemRegionVal `&SymRegion{$p}` instead. So nonloc::SymbolVal can only be used on regular integers. This ensures that Loc values are always used for representing pointers (or references, or values of glvalue expressions) and NonLoc values are always used for representing integers and other prvalues of non-pointer type.

This entire system of adaptors might seem unnecessarily complicated and it probably is but i can't say we suffer too much from its existence and i don't have anything better in mind and i believe it adds a bit of type safety that helps us avoid introducing bugs in the code.

See also http://lists.llvm.org/pipermail/cfe-dev/2017-June/054084.html<https://protect2.fireeye.com/v1/url?k=f0f43231-ae44afa9-f0f472aa-861fcb972bfc-19b392674f8118f1&q=1&e=20c0f0f9-24a3-4e82-b239-12b082ab0f46&u=http%3A%2F%2Flists.llvm.org%2Fpipermail%2Fcfe-dev%2F2017-June%2F054084.html>


> `clang_analyzer_dump()` says it is an element region

It doesn't. It says "&Element", not "Element". This should be read as "address of element" and indicates that the dumped value is a loc::MemRegionVal, i.e. a pointer value. That's exactly how explainer works as well, which is why it says "pointer to".


On 6/18/20 12:57 PM, Ádám Balogh via cfe-dev wrote:
Hello,

I am trying to understand how to distinguish the value of the pointer itself and the pointed region. However, I experience some contradictions while testing. Look at the following piece of code:
```
const int* get_ptr();

void f() {
  const int *p = get_ptr();
  clang_analyzer_dump(p);
  clang_analyzer_explain(p);
}
```

The output of this code:
```
ptr_dump_explain.c:8:3: warning: &SymRegion{conj_$2{const int *, LC1, S715, #1}} [debug.ExprInspection]
  clang_analyzer_dump(p);
  ^~~~~~~~~~~~~~~~~~~~~~
ptr_dump_explain.c:9:3: warning: symbol of type 'const int *' conjured at statement 'get_ptr()' [debug.ExprInspection]
  clang_analyzer_explain(p);
  ^~~~~~~~~~~~~~~~~~~~~~~~~
```

Is `p` a region or a symbol? `clang_analyzer_dump()` says it is a region, more specifically a symbolic region, but still a region. However, `clang_analyzer_explain()` says it is a symbol, which I think is wrong. According to `SValExplainer.h` it should print something like `object at...` or `pointee of ...` but not explain the raw symbol without mentioning the region.

I tried to change the code to the following:
```
void f() {
  const int *p = get_ptr();
  ++p;
  clang_analyzer_dump(p);
  clang_analyzer_explain(p);
}
```

The output changes:
```
ptr_dump_explain.c:9:3: warning: &Element{SymRegion{conj_$2{const int *, LC1, S715, #1}},1 S64b,int} [debug.ExprInspection]
  clang_analyzer_dump(p);
  ^~~~~~~~~~~~~~~~~~~~~~
ptr_dump_explain.c:10:3: warning: pointer to element of type 'int' with index 1 of pointee of symbol of type 'const int *' conjured at statement 'get_ptr()' [debug.ExprInspection]
  clang_analyzer_explain(p);
  ^~~~~~~~~~~~~~~~~~~~~~~~~
```

This is even stranger, because here `clang_analyzer_dump()` says it is an element region, thus a region of the array element. However, here `clang_analyzer_explain()` says it is a pointer to the element, thus not the element itself. According to `SValExplainer.h` the output for an element region should begin with `element of type...`. What is wrong here? Both functions take the same type of parameter:
```
void clang_analyzer_dump(const int*);
void clang_analyzer_explain(const int*);
```

What do I misunderstand here?

Regards,

Ádám




_______________________________________________

cfe-dev mailing list

cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>

https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev<https://protect2.fireeye.com/v1/url?k=ee46ce85-b0f6531d-ee468e1e-861fcb972bfc-c7bdc8fa63e2dae4&q=1&e=20c0f0f9-24a3-4e82-b239-12b082ab0f46&u=https%3A%2F%2Flists.llvm.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fcfe-dev>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200619/a9611de4/attachment-0001.html>


More information about the cfe-dev mailing list