[PATCH] D158813: [analyzer] MPIChecker: MPI_Waitall should respect count arg

DonĂ¡t Nagy via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Sep 6 07:54:04 PDT 2023


donat.nagy requested changes to this revision.
donat.nagy added a comment.
This revision now requires changes to proceed.

I reviewed this change and collected my suggestions in comments.

In general, I feel that the code added by this commit is "sloppy" in the sense that it's designed for the common case and would behave unpredictably in unusual situations. This is a fault of the toolbox provided by the analyzer: in a saner world, you could freely combine the API functions (as you did) and expect that they would "just work" together; but in reality every function has its quirks and limitations, so the developer must understand the details of the implementation.



================
Comment at: clang/lib/StaticAnalyzer/Checkers/MPI-Checker/MPIChecker.cpp:189-190
+        ASTCtx.getCharWidth();
+    const NonLoc MROffset =
+        SVB.makeArrayIndex(MR->getAsOffset().getOffset() / ElemSizeInBits);
 
----------------
danix800 wrote:
> steakhal wrote:
> > What implies that `MR->getAsOffset()` succeeds, and also what implies that the `Offset` within that is not symbolic?
> > Also, how can you use this without also using the result region?
> > Without using that you don't know what is the anchor point, from which this offset represent anything.
> > ATM I believe the code assumes that `MR->getRegion()` equals to `SuperRegion`, which might not be always the case.
> > This could materialize a problem when you construct the element region later.
> I'll restrict the checker to handle non-symbolic offset only.
> ATM I believe the code assumes that MR->getRegion() equals to SuperRegion, which might not be always the case.
This remark of @steakhal highlights a problem which is still present in your code: when you calculate `SuperRegion` you strip a single `ElementRegion` layer; while the offset that you get via `getDynamicElementCountWithOffset()` is calculated from `MemRegion::getAsOffset()` which can strip multiple `ElementRegion`, `FieldRegion` etc. layers.

Unfortunately the memory region API of Clang Static Analyzer is very inconsistent and quirky; you must dig deep into the implementation to understand the exact meaning of these function calls. 

For example, if you have a data type like
```lang=c
struct ReqStruct {
    int custom_info;
    MPI_Request reqs[8];
} rs;
```
then the `SuperRegion` will refer to the immediate parent of the `ElementRegion` (which is presumably the `FieldRegion` corresponding to `reqs`); but the offset value is measured from the start of the `VarRegion` corresponding to the variable `rs` (and includes `sizeof(int)` + potential padding due to the presence of `custom_info`).

I think the correct solution would measure the offset from the start of the array of `MPI_Request` objects; because if you measure it from the start of the struct, then you may encounter serious issues, e.g. there is no guarantee that the offset will be divisible by the size of `MPI_Request`.

Unfortunately this would mean that you cannot rely on the logic of `getDynamicElementCountWithOffset()`;  and I don't see a clear way to generalize that library function to limit the number/type of region layers that it strips.

Here I don't see a clear way forward, as the two potential solutions are a deep rewrite that complicates the library code and a creating a locally tweaked duplicate of the library logic -- and these both have serious drawbacks.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/MPI-Checker/MPIChecker.cpp:146
 
+static std::optional<std::pair<NonLoc, llvm::APSInt>>
+getRequestRegionOffsetAndCount(const MemRegion *const MR, const CallEvent &CE) {
----------------
Use `long long` (or `int64_t`, `ssize_t`, whatever you prefer) instead of `NonLoc` here because it's easier to handle them (and you don't work with symbolic offsets anyway).


================
Comment at: clang/lib/StaticAnalyzer/Checkers/MPI-Checker/MPIChecker.cpp:174
+  return std::make_pair(
+      SVB.makeArrayIndex(RequestRegionOffset.getOffset() / TypeSizeInBits),
+      RequestRegionCount->getValue());
----------------
What happens here in the case when `TypeSizeInChars.isZero()` holds?

I see that you used a ternary operator to avoid division by zero; but do you get a meaningful and valid offset value in the case when `MPI_Request` has zero size and you use `TypeSizeInBits == 8`?

I understand that the divisior does not matter in the case when `RequestRegionOffset.getOffset()` is zero; but if I understand it correctly this offset value can be nonzero even in the case when `MPI_Request` has zero size. (Under the hood it calls `MemRegion::getAsOffset()` which may include the offset of e.g. a data member of an object.)


================
Comment at: clang/lib/StaticAnalyzer/Checkers/MPI-Checker/MPIChecker.cpp:213
+    // StdCLibraryFunctions
+    for (size_t i = 0; i < RegionCount && i < RequestedCount; ++i) {
+      auto RegionIndex =
----------------
Paranoid remark: if these count values are both very large (e.g. negative numbers converted to `size_t`), then this loop could hang for a long time and/or exhaust the memory. Consider adding a hard limit on the number of iterations.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/MPI-Checker/MPIChecker.cpp:215-217
+          SVB.evalBinOp(Ctx.getState(), BO_Add, SVB.makeArrayIndex(i),
+                        RegionOffset, SVB.getArrayIndexType())
+              .getAs<NonLoc>();
----------------
If you represent `RegionOffset` as a `long long`, you can replace this complicated `evalBinOp` with a simple `SVB.makeArrayIndex(i + RegionOffset);`.

Note: @steakhal already suggested this improvement in the earlier remark
> My guess is that we only care about if MROffset is a concrete int, thus we could also just do this assidion in regular c++ and just transfer the result into the symbolic domain.
which referred to an older variant of your patch.



================
Comment at: clang/test/Analysis/mpichecker.cpp:276-280
+  typedef struct {
+    MPI_Request req[3];
+    MPI_Request req2;
+  } ReqStruct;
+
----------------
Please add a dummy field above the other fields to verify that the offsets are handled correctly. For example, `char request_name[99];` would be sufficiently large to highlight errors.

This change could be applied in other tests as well.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158813/new/

https://reviews.llvm.org/D158813



More information about the cfe-commits mailing list