[llvm] [BOLT] Gadget scanner: improve handling of unreachable basic blocks (PR #136183)

Sat Jun 21 09:13:36 PDT 2025

================
@@ -1428,6 +1428,90 @@ printed_instrs_nocfg:
         br      x0
         .size   printed_instrs_nocfg, .-printed_instrs_nocfg
 
+// Test handling of unreachable basic blocks.
+//
+// Basic blocks without any predecessors were observed in real-world optimized
+// code. At least sometimes they were actually reachable via jump table, which
+// was not detected, but the function was processed as if its CFG was
+// reconstructed successfully.
+//
+// As a more predictable model example, let's use really unreachable code
+// for testing.
+
+        .globl  bad_unreachable_call
+        .type   bad_unreachable_call, at function
+bad_unreachable_call:
+// CHECK-LABEL: GS-PAUTH: Warning: the function has unreachable basic blocks (possibly incomplete CFG) in function bad_unreachable_call, basic block {{[^,]+}}, at address
----------------
kbeyls wrote:

IIUC, with this patch basic blocks that are not part of the CFG as reconstructed by BOLT are now also analyzed. These blocks are analyzed with a pessimistic initial state.
There are 2 possible cases why a basic block is not part of the CFG:
1. BOLT wasn't able to reconstruct the CFG correctly.
   In this case, the pessimistic assumptions are probably going to cause more false positives in these "not-part-of-the-CFG basic blocks" than if BOLT was able to reconstruct the CFG?
   If all my assumptions above are correct, maybe the warning messages should state that more clearly, for example, something like "Warning: function {function_name} has seemingly unreachable basic blocks, possibly due to limits in how bolt can reverse engineer the CFG. This may lead to more false positives being reported in these basic blocks"
  Having said just that, I'm also thinking that maybe this is similar to the situation where BOLT cannot create a CFG at all. If so, should we (are we already?) producing a warning then too? But maybe that would produce way too many warnings?
2. The code really contains dead code (not impossible, code generators and assembly writers are known to sometimes make mistakes, or there might be a legitimate reason for seemingly dead code to be present in the binary)
  Would it be correct to say that in this case all reports against these dead code basic blocks would be false positives?

https://github.com/llvm/llvm-project/pull/136183