[PATCH] D105167: [analyzer] Fix HTML report deduplication.

Artem Dergachev via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Jun 29 20:09:41 PDT 2021


NoQ created this revision.
NoQ added reviewers: vsavchenko, xazax.hun, Szelethus, ASDenysPetrov.
Herald added subscribers: manas, steakhal, martong, dkrupp, donat.nagy, arphaman, mikhail.ramalho, a.sidorin, rnkovacs, szepet, baloghadamsoftware.
NoQ requested review of this revision.

Folks, I deleted some code.

Some of it was... Perl code.

Not sure if anybody noticed but scan-build didn't deduplicate reports correctly. Its algorithm was extremely primitive (basically it hashed each html file and deleted files with duplicate hash). This obviously didn't work with cross-file reports (i.e., bug path starts in the main file but ends in a header) which we intended to deduplicate by the end-of-path location, because the main file name is included in the report and therefore affects md5. D42269 <https://reviews.llvm.org/D42269> doesn't help either. Long story short, it didn't honor the modern advancements in `IssueHash` at all.

I could make scan-build read the issue hash but I think my solution is even more elegant so hear me out. I put the issue hash into the report filename instead, replacing the random section. When clang tries to emit a duplicate report, it'll simply fail because the file with that name is already present. Moreover, as per tradition, I reduce the issue hash to the first 6 characters, so bug report filenames look exactly like before (`report-123abc.html`), except now they're automatically stable and deterministic! Such truncation is, of course, entirely cosmetic and absolutely unnecessary but I think it's pretty cool.

The flag `-analyzer-config stable-report-filename=true` now becomes redundant because reports are stable anyway (in fact, they weren't actually stable before the patch even with the flag, because they depended on the race between clang invocations to emit the reports; I changed it to include a snippet of the issue hash as well instead of the race-y index). That said, I'm conflicted about removing the flag entirely because it also produces more verbose/readable filenames that people seem to enjoy. I think we should enable it by default instead, as soon as we make sure it doesn't produce extremely long filenames.

`scan-build --generate-index-only` no longer does deduplication, as seen on the updated test "`rebuild_index`". If deduplication is desired, it can typically be achieved with a simple

  find -name '*.html' -exec mv "{}" . \;


Repository:
  rC Clang

https://reviews.llvm.org/D105167

Files:
  clang/lib/StaticAnalyzer/Core/HTMLDiagnostics.cpp
  clang/test/Analysis/scan-build/Inputs/deduplication/1.c
  clang/test/Analysis/scan-build/Inputs/deduplication/2.c
  clang/test/Analysis/scan-build/Inputs/deduplication/header.h
  clang/test/Analysis/scan-build/deduplication.test
  clang/test/Analysis/scan-build/rebuild_index/rebuild_index.test
  clang/test/Analysis/scan-build/rebuild_index/report-3.html
  clang/test/Analysis/scan-build/rebuild_index/subdirectory/report-3.html
  clang/test/Analysis/scan-build/rebuild_index/subdirectory/report-4.html
  clang/tools/scan-build/bin/scan-build

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D105167.355427.patch
Type: text/x-patch
Size: 13719 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20210630/98df28de/attachment-0001.bin>


More information about the cfe-commits mailing list