[all-commits] [llvm/llvm-project] cb7fe9: [ASAN][AMDGPU] Make address sanitizer checks more ...

Valery Pykhtin via All-commits all-commits at lists.llvm.org
Thu Jan 4 13:58:46 PST 2024


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: cb7fe9ad4c3103d90c20d55819a9e69ab66ab3d0
      https://github.com/llvm/llvm-project/commit/cb7fe9ad4c3103d90c20d55819a9e69ab66ab3d0
  Author: Valery Pykhtin <valery.pykhtin at gmail.com>
  Date:   2024-01-04 (Thu, 04 Jan 2024)

  Changed paths:
    M llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp
    M llvm/test/Instrumentation/AddressSanitizer/AMDGPU/asan_instrument_constant_address_space.ll
    M llvm/test/Instrumentation/AddressSanitizer/AMDGPU/asan_instrument_generic_address_space.ll
    M llvm/test/Instrumentation/AddressSanitizer/AMDGPU/asan_instrument_global_address_space.ll

  Log Message:
  -----------
  [ASAN][AMDGPU] Make address sanitizer checks more efficient for the divergent target. (#72247)

Address sanitizer checks for AMDGPU target in non-recovery mode aren't
quite efficient at the moment which can be illustrated with a program:
```
instr_before; 
load ptr1; 
instr_in_the_middle; 
load ptr2; 
instr_after; 
```
ASAN generates the following instrumentation:
```
instr_before; 
if (sanity_check_passed(ptr1)) 
  load ptr1; 
  instr_in_the_middle; 
  if (sanity_check_passed(ptr2)) 
     load ptr2; 
     instr_after; 
  else 
     // ASAN report block 2 
     __asan_report(ptr2); // wave terminates   
     unreachable; 
else 
   // ASAN report block 1 
  __asan_report(ptr1); // wave terminates 
  unreachable; 
```
Each sanitizer check is treated as a non-uniform condition (and this is
true because some lanes may pass the check and some don't). This results
in the program above: basically normal program flow is continued in
_then_ blocks. This way it allows lanes that pass all sanity checks to
complete the program and then the wave terminates at the first reporting
_else_ block. For each _else_ block it has to keep execmask and pointer
value to report error consuming tons (megatons!) of registers which are
live till the program end.

This patch changes the behavior on a failing sanity check: instead of
waiting when passing lanes reach program end report error and terminate
as soon as any lane has violated the sanity check. Sanity check
condition is treated uniform with this approach and the resulting
program looks much like ordinary CPU code:

```
instr_before; 
if (any_lane_violated(sanity_check_passed(ptr1)))
  // ASAN report block 1 
  __asan_report(ptr1); // abort the program 
  unreachable; 
load ptr1; 
instr_in_the_middle; 
if (any_lane_violated(sanity_check_passed(ptr2))) 
  // ASAN report block 2   
  __asan_report(ptr2); // abort the program 
  unreachable; 
load ptr2; 
instr_after; 
```

However it has to use a trick to pass structurizer and some later
passes: ASAN check is generated like in recovery mode but reporting
function aborts, that is standard _unreachable_ instruction isn't used:
```
...
if (any_lane_violated(sanity_check_passed(ptr1)))
  // ASAN report block 1 
  __asan_report(ptr1); // abort the program 
  // pretend we're going to continue the program
load ptr1; 
...
```
This may create some undesirable effects:
1. Register allocator generates a lot of code for save/restore registers
for asan_report call. This may potentially bloat the code since we have
a report block for every accessed pointer.
2. Loop invariant code in report blocks is hoisted into a loop
preheader. I'm not sure but probably this can be solved using block
frequency information, but most likely this isn't a problem at all.

These problems are to be addressed later.

### Flattening address sanitizer check 

In order to simplify divergent CFG this patch also changes the
instrumentation code from:

```
  uint64_t address = ptr; 
  sbyte *shadow_address = MemToShadow(address); 
  sbyte shadow_value = *shadow_address; 
  if (shadow_value) { 
    sbyte last_accessed_byte = (address & 7) + kAccessSize - 1; 
    if (last_accessed_byte >= shadow_value) { 
      ReportError(address, kAccessSize, kIsWrite); 
      abort(); 
    } 
  } 
```
to 
```
  uint64_t address = ptr; 
  sbyte *shadow_address = MemToShadow(address); 
  sbyte shadow_value = *shadow_address; 

  sbyte last_accessed_byte = (address & 7) + kAccessSize - 1; 
  if (shadow_value && last_accessed_byte >= shadow_value) { 
    ReportError(address, kAccessSize, kIsWrite); 
    abort(); 
  } 
```
It saves one _if_ which really avoids very few instructions and their
latency can be hidden by the load from shadow memory.




More information about the All-commits mailing list