[flang-commits] [flang] [Flang][OpenMP] Heap-allocate GPU dynamic private arrays in distribute parallel do (PR #200841)

Spencer Bryngelson via flang-commits flang-commits at lists.llvm.org
Sat Jun 20 18:45:47 PDT 2026


sbryngelson wrote:

Verified the fix (PR llvm/llvm-project#200841) on gfx90a (MI210).

The public Jun-12 ROCm nightly (`therock-dist-linux-gfx90a-7.14.0a20260612`, fork commit `52226beb`) still faults — it builds from a branch that doesn't have the fix yet. The fix is only on `amd-staging`, so I built flang from source:

```
flang version 23.0.0git (https://github.com/ROCm/llvm-project.git 09cac6e4d442814e2448d62fe7a57b8d135e3d09)  [amd-staging]
```

Original minimal reproducer (VLA `private` array sized from an assumed-shape dummy) on gfx90a:

| compiler | result |
|---|---|
| ROCm 7.2.0 (LLVM 22) | GPU memory access fault |
| AFAR 23.2.0 (LLVM 23) | GPU memory access fault |
| Jun-12 nightly 7.14 (no fix) | GPU memory access fault |
| **flang built from amd-staging (fix)** | **`1.`, exit 0, deterministic over 3 runs** |

Confirmed it's the fix doing the work:
- compiling emits the new diagnostic `OpenMP private dynamic array 'tmp' ... using device heap allocation`
- the device kernel now references `malloc`/`free` instead of an `addrspace(5)` scratch alloca

Note for anyone hitting this: the fix isn't in a shipping/nightly build yet (only `amd-staging` as of 2026-06-20) — a stock `amdflang` still faults until `amd-staging` promotes. Device link of the heap-allocated path needs the GPU libc (`-Xoffload-linker -lc`; `malloc` resolves via `__ockl_dm_alloc`).


https://github.com/llvm/llvm-project/pull/200841


More information about the flang-commits mailing list