[compiler-rt] [ASan] Prevent ASan/LSan deadlock by preloading modules before error reporting (PR #131756)

Wed Apr 2 02:03:39 PDT 2025

Camsyn wrote:

### Proposed Fixes for ASan-LSan Deadlock Issue  
After analyzing the ASan/LSan code and commit history, I confirm that concurrent error reporting from different threads should not be considered (it has already been handled by `ScopedErrorReportLock`). Below are the potential solutions (to my knowledge) :

---

#### **1. Maintain Current Patch (Incomplete)**  
- **Approach**: Keep the existing patch to preload modules in ASan, avoiding holding Lock A (libdl) under Lock B (ASan thread lock).  
- **Risk**: If other threads modify the module list via `dlopen`/`dlclose` during this window, the patch may regress to the original deadlock scenario. While rare (due to the low probability of overlapping LSan checks and ASan reports + module updates), this is not a complete fix.  
  - **Supplemental Idea**: Add synchronization to make `dlopen`/`dlclose` wait until error reports complete before updating module lists.  
  - **Rationale**: Error reports depend on module list, and the "buggy" module triggering the report should logically exist stably before the report.  Should the module list be frozen during error reporting?

---

#### **2. Revert LSan Lock Order (Incomplete)**  
- **Approach**: Move LSan's acquisition of Lock B (thread lock) *before* `dl_iterate_phdr` (Lock A). Historically, LSan acquired Lock B before `dl_iterate_phdr`, but commit f1b6bd4 moved it inside `dl_iterate_phdr` to handle potential `malloc` calls during iteration.  
- **Key Question**: Since ASan does not hold the allocator lock during error reporting, could reverting Lock B acquisition to pre-`dl_iterate_phdr` resolve the deadlock?  
- **Risk**: Thread creation within `dl_iterate_phdr` (now holding Lock B) might reintroduce deadlocks similar to those addressed in `f1b6bd4`.  

---

#### **3. Align ASan with LSan (Complete)**  
- **Approach**: Make ASan acquire Lock A (via `dl_iterate_phdr`) before error reporting, mirroring LSan's lock order.  
- **Challenge**: Requires modifying numerous `ReportXXXX` functions to explicitly handle Lock A, which may introduce code bloat and maintenance overhead. 
- **Tradeoff**: Ensures consistent lock hierarchy (A→B) but complicates ASan's reporting logic.  

---

#### **4. Leverage `ScopedErrorReportLock` for LSan (Complete)**  
- **Approach**: Have LSan perform `dl_iterate_phdr` while holding `ScopedErrorReportLock`, preventing concurrent ASan error reports.  
- **Consideration**: While LSan leak checks do not inherently require error reporting locks, this would enforce mutual exclusion between leak scans and ASan reports.  
- **Open Question**: Is it appropriate to tie leak-check synchronization to error reporting locks? 

---

### Key Tradeoffs  
| Solution            | Completeness | Risk                             | Code Impact |
| ------------------- | ------------ | -------------------------------- | ----------- |
| 1                   | Partial      | Regression risk                  | Low         |
| 1 (with supplement) | Full         | Code changes in `sanitizer_common` | Moderate    |
| 2                   | Partial      | New deadlock scenarios           | Moderate    |
| 3                   | Full         | High maintenance cost            | High        |
| 4                   | Full         | Logical coupling concerns        | Low         |

---

Which fix should we choose or are there any other suggestions?

https://github.com/llvm/llvm-project/pull/131756