[clang] [analyzer] Enforce not making overly complicated symbols (PR #144327)

Balázs Benics via cfe-commits cfe-commits at lists.llvm.org
Fri Jun 27 00:13:03 PDT 2025


balazs-benics-sonarsource wrote:

Backporting this updated version to clang-19 was not easy but it allowed to verify this PR.
Unfortunately, there is a bug in this updated version, that I'll explain.
Even after I fixed this bug, however, on a sample it ran about 19.36% slower under analyzing hashing-heavy real-world code.
<details>
<summary>Example hashing-heavy real-world code from <i>hyperscan:state_compress.c:storecompressed512_64bit</i></summary>

<pre>
typedef unsigned long long __attribute__((aligned((8)))) u64a;

u64a compress64(u64a x, u64a m) {
  if ((x & m) == 0)
      return 0;
  x &= m;
  u64a mk = ~m << 1;
  for (unsigned i = 0; i < 6; i++) {
    u64a mp = mk ^ (mk << 1);
    mp ^= mp << 2;
    mp ^= mp << 4;
    mp ^= mp << 8;
    mp ^= mp << 16;
    mp ^= mp << 32;
    u64a mv = mp & m;
    m = (m ^ mv) | (mv >> (1 << i));
    u64a t = x & mv;
    x = (x ^ t) | (t >> (1 << i));
    mk = mk & ~mp;
  }
  return x;
}

void storecompressed512_64bit(u64a *m, u64a *x) {
  u64a v[8] = {
    compress64(x[0], m[0]),
    compress64(x[1], m[1]),
    compress64(x[2], m[2]),
    compress64(x[3], m[3]),
    compress64(x[4], m[4]),
    compress64(x[5], m[5]),
    compress64(x[6], m[6]),
    compress64(x[7], m[7]),
  };
  (void)v;
}
</pre>
</details>

The bug was that we inserted the `T` into the expected `InsertPos` corresponding to the `Profile` of `T` - so far so good, but instead of returning that, we checked if this is "overly complicated", and if so, we created the new `SymbolOverlyComplex` symbol and returned that instead. However, when the next time we get to call `acquired<T>`, we would simply get a successful lookup for the `Profile` of `T` and return the symbol we actually wanted to hide! instead of returning the `SymbolOverlyComplex` that we returned in the previous `acquire<T>` call. This took a while to debug.
My solution to this is pretty complicated and ugly. In short, check the complexity of the symbol before we do insertion/lookup.
If the complexity of this symbol would be below the threshold, then find or insert just like we did before.
Otherwise, we need to look up the wrapped symbol first, then check if we have a `SymbolOverlyComplex` with that wrapped symbol in the map. If there is, just return that one, otherwise we create it.

As you can tell, this is pretty convoluted. And its still slow. On the attached test case, I observed 19.36% slowdown (went from 1.503s to 1.794s).
So I have 2 ways of resolving this:
1) Continue the investigation about why this implementation is slow.
2) Revert back to the variant I originally proposed when I opened this PR.


https://github.com/llvm/llvm-project/pull/144327


More information about the cfe-commits mailing list