[all-commits] [llvm/llvm-project] e20dd2: [Comgr][hotswap] Add code-object metadata parsing
ftynse via All-commits
all-commits at lists.llvm.org
Thu Jun 11 07:19:01 PDT 2026
Branch: refs/heads/users/ftynse/msb
Home: https://github.com/llvm/llvm-project
Commit: e20dd2510875fd16a7adc29967179ce711d33268
https://github.com/llvm/llvm-project/commit/e20dd2510875fd16a7adc29967179ce711d33268
Author: Martin Paul Lücke <martin.luecke at amd.com>
Date: 2026-05-15 (Fri, 15 May 2026)
Changed paths:
M .github/workflows/multi_arch_build_portable_linux_artifacts.yml
M .github/workflows/multi_arch_build_windows_artifacts.yml
M amd/comgr/src/comgr-metadata.cpp
M amd/comgr/src/comgr-metadata.h
M amd/comgr/src/comgr-symbol.cpp
M amd/comgr/src/comgr-symbol.h
M amd/comgr/src/hotswap/CMakeLists.txt
A amd/comgr/src/hotswap/code-object-utils.cpp
A amd/comgr/src/hotswap/code-object-utils.h
R amd/comgr/src/hotswap/code_object_utils.h
A amd/comgr/src/hotswap/hotswap-error.cpp
A amd/comgr/src/hotswap/hotswap-error.h
A amd/comgr/src/hotswap/mc-state.cpp
A amd/comgr/src/hotswap/mc-state.h
A amd/comgr/src/hotswap/raise-failure.h
R amd/comgr/src/hotswap/raise_failure.h
M amd/comgr/src/hotswap/raiser.h
M amd/comgr/test-unit/CMakeLists.txt
A amd/comgr/test-unit/CodeObjectUtilsTest.cpp
A amd/comgr/test-unit/MCContextInlineSrcmgrTest.cpp
Log Message:
-----------
[Comgr][hotswap] Add code-object metadata parsing
Adds the AMDGPU code-object metadata extraction surface that the rest of
the raiser pipeline depends on:
* `extractTextSection`, `listKernelNames`, `extractKernelMeta`,
`findKernelSymbolOffset`, and `detectIsaFromElf` in
`hotswap/code_object_utils.{h,cpp}`.
* `KernelMeta` populated with both MsgPack-derived fields (kernarg/group/
private segment sizes, args) and the raw kernel-descriptor register
bytes (`compute_pgm_rsrc{1,2}`, `kernel_code_properties`,
`kernarg_preload`) read from `<name>.kd` in `.rodata`. Later layers
(`UserSgprLayout`, kernarg layout) consume these.
Reuses comgr's existing parsing infrastructure rather than duplicating
it: each helper wraps the input ELF bytes in an
`AMD_COMGR_DATA_KIND_EXECUTABLE` data object via a small
`ScopedDataObject` RAII helper and routes through
`metadata::getMetadataRoot` / `metadata::getElfIsaName` for the metadata
walk and ISA-name extraction. No new entry points are added to
`comgr-metadata.{h,cpp}`. Tests + raise_cli pick up `amd_comgr` on their
link line so the public C entry points
(`amd_comgr_create_data` / `set_data` / `release_data`) resolve.
Includes a small focused gtest covering empty-input and malformed-ELF
guards on the public hotswap API.
No MC stack, no disassembler, no opcode canonicalisation yet — those
land in subsequent commits.
Co-Authored-By: Tim Gymnich <tim at gymni.ch>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Co-Authored-By: Alex Zinenko <git at ozinenko.com>
Commit: d2c6553024b96d73a8613e14a7e456d4c8f8d207
https://github.com/llvm/llvm-project/commit/d2c6553024b96d73a8613e14a7e456d4c8f8d207
Author: Martin Paul Lücke <martin.luecke at amd.com>
Date: 2026-05-19 (Tue, 19 May 2026)
Changed paths:
M amd/comgr/src/hotswap/CMakeLists.txt
A amd/comgr/src/hotswap/amdgpu-formats.h
A amd/comgr/src/hotswap/canonical-op.cpp
A amd/comgr/src/hotswap/canonical-op.h
A amd/comgr/src/hotswap/opcode-map.cpp
A amd/comgr/src/hotswap/opcode-map.h
M amd/comgr/test-unit/CMakeLists.txt
A amd/comgr/test-unit/OpcodeMapTest.cpp
Log Message:
-----------
[Comgr][hotswap] Add CanonicalOp + AMDGPU MC format aliases + opcode map
Adds the three foundation files the per-format handlers will branch on:
* canonical_op.{h,cpp} — `CanonicalOp` enum (one entry per
target-independent semantic op the raiser knows how to lift) and a
name table for diagnostics.
* amdgpu_formats.h — convenience aliases over `SIInstrFlags` so
handler dispatches read as `Flags & FORMAT_VALU` / `FORMAT_SOPK`
instead of hand-typing the bit layout.
* opcode_map.{h,cpp} — `OpcodeMap` builds an MC-opcode -> CanonicalOp
table once at first use by walking the MCInstrInfo registered for
a given AMDGPU subtarget. Per-handler entries are folded into the
table here; the actual `lookup` consumers come in subsequent
handler commits.
`hotswap/CMakeLists.txt` grows to compile canonical_op.cpp +
opcode_map.cpp into the OBJECT lib. No new external dependencies.
Co-Authored-By: Tim Gymnich <tim at gymni.ch>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Commit: d3d10dd75a6295c5670542d73f48da6068eabb79
https://github.com/llvm/llvm-project/commit/d3d10dd75a6295c5670542d73f48da6068eabb79
Author: Martin Paul Lücke <martin.luecke at amd.com>
Date: 2026-05-19 (Tue, 19 May 2026)
Changed paths:
M amd/comgr/src/hotswap/CMakeLists.txt
A amd/comgr/src/hotswap/decode.cpp
A amd/comgr/src/hotswap/decode.h
A amd/comgr/src/hotswap/decoded-inst.h
A amd/comgr/src/hotswap/parsed-reg.h
Log Message:
-----------
[Comgr][hotswap] Add DecodedInst + basic kernel decoder
Wraps the MC-layer disassembler in a hotswap-shaped API:
* decoded_inst.h — `DecodedInst` value type bundling MCInst, raw
bytes, kernel offset, TSFlags, and an explicit `IsBranch` /
`BranchTarget` decoration so Phase-3 BB layout can run without
re-querying MCInstrDesc.
* parsed_reg.h — `ParsedReg{Kind, BaseIdx, NReg}` value type so
handlers can match on register class without re-parsing
`MCOperand::getReg()` at every consumer.
* decode.{h,cpp} — `decodeKernel` walks the .text section once,
populates a `DecodeResult{Insts, BlockStarts, Offsets}` view used
by every subsequent raise phase. Block boundaries derive from the
branch target set discovered during the linear sweep so Phase 3
can pre-create LLVM BasicBlocks before any IR builder lands.
* mc_state.cpp — minor reshuffle to expose the `MCDisassembler` /
`MCSubtargetInfo` pair to the new decode entry point.
Co-Authored-By: Tim Gymnich <tim at gymni.ch>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Commit: 94dee4dbc969a3918ca5cd8fa3c80e28252e84c0
https://github.com/llvm/llvm-project/commit/94dee4dbc969a3918ca5cd8fa3c80e28252e84c0
Author: Martin Paul Lücke <martin.luecke at amd.com>
Date: 2026-05-19 (Tue, 19 May 2026)
Changed paths:
M amd/comgr/src/hotswap/CMakeLists.txt
A amd/comgr/src/hotswap/isa-profile.h
A amd/comgr/src/hotswap/kernarg-layout.cpp
A amd/comgr/src/hotswap/kernarg-layout.h
A amd/comgr/src/hotswap/raise-failure.cpp
M amd/comgr/src/hotswap/raise-failure.h
A amd/comgr/src/hotswap/reg-file.cpp
A amd/comgr/src/hotswap/reg-file.h
A amd/comgr/src/hotswap/user-sgpr-layout.cpp
A amd/comgr/src/hotswap/user-sgpr-layout.h
A amd/comgr/src/hotswap/wave-projection.h
M amd/comgr/test-unit/CMakeLists.txt
A amd/comgr/test-unit/KernargLayoutTest.cpp
Log Message:
-----------
[Comgr][hotswap] Add raise_failure + register file + ISA/kernarg/SGPR layouts
Building-block types consumed by RaiseContext (lands in the following
commit) and every per-format handler. Each piece is self-contained and
testable in isolation:
* raise_failure.{h,cpp} — `RaiseFailure` value type with structured
`Reason`, `Mnemonic`, `Format`, `Detail` fields plus typed
factories (`unsupportedOpcode`, `unsupportedShape`, `bailReason`,
...). Carries the diagnostic data the kerneldex coverage runner
bucketizes on.
* reg_file.{h,cpp} — `AllocaRegFile` models the AMDGPU SGPR/VGPR/AGPR
register files as an alloca-per-physical-reg promoted to SSA by
Phase 6 mem2reg. `readReg{32,64,Vec}` / `writeReg{32,64,Vec}`
handle the bitcasts handlers don't want to repeat.
* isa_profile.h — `ISAProfile` snapshots the wave-direction-relevant
capability bits (wave size, AGPRs/MFMA, VOPD, FP8 conversion,
gfx950 MAI, gfx1250 TENSOR ops, ...) for a given subtarget.
* kernarg_layout.{h,cpp} — `KernargLayout` builds the byref placeholder
type the lifted IR's first parameter takes. Mirrors the source
kernel's MsgPack-derived offset/size layout.
* user_sgpr_layout.{h,cpp} — `UserSgprLayout::tryFromKernelMeta`
derives the source SGPR ABI (workgroup id / private segment buffer
/ kernarg ptr / kernarg-preload SGPRs) from the kernel descriptor.
* wave_projection.h — forward declarations for the projection
interface reg-file already consumes by reference. The matching
.cpp lands with the wave-projection / setpc-analysis commit.
Co-Authored-By: Tim Gymnich <tim at gymni.ch>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Commit: c220b993a345a3b7ddb6fbc54bf41003b10b4877
https://github.com/llvm/llvm-project/commit/c220b993a345a3b7ddb6fbc54bf41003b10b4877
Author: Martin Paul Lücke <martin.luecke at amd.com>
Date: 2026-05-19 (Tue, 19 May 2026)
Changed paths:
M amd/comgr/src/hotswap/CMakeLists.txt
A amd/comgr/src/hotswap/handlers.h
M amd/comgr/src/hotswap/parsed-reg.h
A amd/comgr/src/hotswap/raise-context.cpp
A amd/comgr/src/hotswap/raise-context.h
A amd/comgr/src/hotswap/setpc-analysis.h
M amd/comgr/test-unit/CMakeLists.txt
Log Message:
-----------
[Comgr][hotswap] Add RaiseContext + handler/setpc forward declarations
* raise_context.{h,cpp} — `RaiseContext` is the per-kernel mutable
state every handler reads and writes: LLVMContext / Module / IR
builder, decoded inst stream, register file, kernarg layout,
user-SGPR layout, ISA profile, wave projection, lane-active mask
cache, the pending failure for read-path errors, etc. Glues the
foundation types from the previous commit (raise-failure,
reg-file, isa-profile, kernarg-layout, user-sgpr-layout) into the
single struct the dispatch loop threads through every per-format
handler.
* handlers.h — forward declarations for every `handleXXX` entry
point. Concrete handler files come in subsequent commits.
* setpc_analysis.h — forward declarations so RaiseContext's
`SetpcAnalysis *` member resolves; the matching .cpp lands with
the wave-projection / setpc-analysis commit.
Co-Authored-By: Tim Gymnich <tim at gymni.ch>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Commit: a084965c7bed969f325fd40288925f5eb2ccf915
https://github.com/llvm/llvm-project/commit/a084965c7bed969f325fd40288925f5eb2ccf915
Author: Martin Paul Lücke <martin.luecke at amd.com>
Date: 2026-05-19 (Tue, 19 May 2026)
Changed paths:
M amd/comgr/src/hotswap/CMakeLists.txt
A amd/comgr/src/hotswap/docs/abi-translation.md
A amd/comgr/src/hotswap/docs/sgpr-wave-mask-translation.md
A amd/comgr/src/hotswap/docs/sync-translation.md
M amd/comgr/src/hotswap/reg-file.cpp
A amd/comgr/src/hotswap/setpc-analysis.cpp
A amd/comgr/src/hotswap/wave-projection.cpp
M amd/comgr/test-unit/CMakeLists.txt
A amd/comgr/test-unit/WaveProjectionTest.cpp
Log Message:
-----------
[Comgr][hotswap] Add wave projection + setpc analysis + abi/sync docs
* wave_projection.{h,cpp} — `WaveProjection` interface + the three
concrete projections (`ThreadLoopProjection`,
`WaveNativeProjection`, `ModuloReplicationProjection`) the raiser
selects between to bridge wave32 -> wave64 cross-widening. Also
holds the `instructionWritesEXEC` predicate and the per-projection
`emitLaneActiveBit` helper handlers consume.
* setpc_analysis.{h,cpp} — `runSetPcAnalysis` walks the decoded
instruction stream and classifies every `s_set_pc_i64` /
`s_swap_pc_i64` site as DirectA, IndirectB, DispatchSet, or
Unresolvable. Output is a `SetPcAnalysis{SetpcSites, ChainTerms}`
bundle handlers consume in Phase 5 to lower indirect branches to
cmp+br cascades.
* docs/abi-translation.md — wave32 <-> wave64 ABI translation
contract (kernarg layout, user-SGPR seeding, scratch / private
segment, hidden args).
* docs/sync-translation.md — barrier / s_waitcnt / TENSOR_CNT
translation matrix.
* docs/sgpr-wave-mask-translation.md — SGPR-as-wave-mask handling
rules referenced by handle_valu_vcmp.cpp (lands later).
Co-Authored-By: Tim Gymnich <tim at gymni.ch>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Commit: 0fe12c3c35eb31e94e1e63cbb0b002c51d08d07e
https://github.com/llvm/llvm-project/commit/0fe12c3c35eb31e94e1e63cbb0b002c51d08d07e
Author: Martin Paul Lücke <martin.luecke at amd.com>
Date: 2026-05-19 (Tue, 19 May 2026)
Changed paths:
M amd/comgr/test-lit/CMakeLists.txt
A amd/comgr/test-lit/comgr-sources/raise_cli.cpp
M amd/comgr/test-lit/lit.cfg.py
M amd/comgr/test-lit/lit.site.cfg.py.in
A amd/comgr/test-shared/standalone-init.cpp
M amd/comgr/test-unit/CMakeLists.txt
Log Message:
-----------
[Comgr][hotswap] Add raise_cli + lit harness for hotswap-raise fixtures
Wires up the per-kernel raiser CLI and the lit-harness scaffolding the
subsequent handler commits' fixtures consume:
* test-lit/comgr-sources/raise_cli.cpp — minimal --emit-ir driver
that calls `raiseToIR` directly and dumps the resulting Module to
stdout via `Module::print`. Subsequent commits add the default
fork mode (kerneldex coverage) and `--write-hsaco` mode (full
pipeline) once `hotswap/pipeline.h` lands.
* test-lit/CMakeLists.txt — gates a new `raise_cli` target on
COMGR_ENABLE_HOTSWAP_TRANSPILE; links `hotswap::transpiler` plus
`comgr-metadata.cpp` and the just-extracted `comgr-target-init.cpp`
directly so the binary doesn't pull in the rest of the amd_comgr
public surface.
* test-lit/lit.site.cfg.py.in — exposes the
COMGR_ENABLE_HOTSWAP_TRANSPILE configure-time value.
* test-lit/lit.cfg.py — adds `comgr-hotswap-transpile` REQUIRES
feature plus the `%llvm_mc`, `%ld_lld`, `%not`, and `%raise_cli`
substitutions every hotswap-raise/ RUN line uses.
* test-unit/CMakeLists.txt — `comgr_link_hotswap_transpiler` helper
grows to compile `comgr-target-init.cpp` into every unit-test
binary so `ensureLLVMInitialized` resolves at link time.
* src/hotswap/code_object_utils.{h,cpp} — adds the `readFile`
helper raise_cli and (later) the pipeline driver consume to slurp
HSACO / IR / asm artefacts off disk; matches the convention of
centralising file I/O in the metadata layer.
The raiser stays at its decode-and-bail / IR-scaffold form from the
preceding commits. With these in place the lit harness is reachable
but every hotswap-raise/ fixture FAILs honestly until the dispatch
loop lands at the `Wire raiser dispatcher` commit.
Co-Authored-By: Tim Gymnich <tim at gymni.ch>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Commit: 1e0d0bc0a841aba18044e18dc723a11b49e0492b
https://github.com/llvm/llvm-project/commit/1e0d0bc0a841aba18044e18dc723a11b49e0492b
Author: Martin Paul Lücke <martin.luecke at amd.com>
Date: 2026-05-19 (Tue, 19 May 2026)
Changed paths:
M amd/comgr/src/hotswap/CMakeLists.txt
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/docs/sync-translation.md
A amd/comgr/src/hotswap/handle-sopc.cpp
A amd/comgr/src/hotswap/handle-sopp.cpp
M amd/comgr/src/hotswap/opcode-map.cpp
A amd/comgr/test-lit/hotswap-raise/abort_gate.s
A amd/comgr/test-lit/hotswap-raise/group_segment_fixed_size_attr.s
A amd/comgr/test-lit/hotswap-raise/saveexec.s
Log Message:
-----------
[Comgr][hotswap] Add SOPP + SOPC handlers
* handle_sopp.cpp — SOP_NoOperand / SOP_Subroutine entries:
s_endpgm, s_barrier, s_branch, s_cbranch_*, s_setprio, s_sleep,
s_waitcnt family, s_waitcnt_depctr/lgkmcnt/expcnt/vmcnt, plus
the gfx1250 s_wait_tensorcnt the VIMAGE handler later defers to.
* handle_sopc.cpp — scalar comparisons (s_cmp_eq_i32 / s_cmp_lt_u64
/ s_bitcmp1_b32 / ...) writing to SCC. Mirrors LLVM's
AMDGPUInstructions.td:1107 categorisation.
Both handlers are reachable through the in-progress dispatch loop
(landing later in `Wire raiser dispatcher`); the current
decode-and-bail raiser does not call them yet, but their TUs land in
the OBJECT lib so the dispatcher hook-up in the later commit becomes
a no-build-rebuild change.
Lit fixtures for SOPP/SOPC follow with the dispatcher commit since
they FileCheck the lifted IR shape, which depends on the dispatcher
actually calling these handlers.
Co-Authored-By: Tim Gymnich <tim at gymni.ch>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Commit: 8c80b6ef58f4b228819dcaa09c917ec8fbc130bc
https://github.com/llvm/llvm-project/commit/8c80b6ef58f4b228819dcaa09c917ec8fbc130bc
Author: Martin Paul Lücke <martin.luecke at amd.com>
Date: 2026-05-19 (Tue, 19 May 2026)
Changed paths:
M amd/comgr/src/hotswap/CMakeLists.txt
A amd/comgr/src/hotswap/canonical-op-attrs.cpp
A amd/comgr/src/hotswap/canonical-op-attrs.h
M amd/comgr/src/hotswap/canonical-op.h
A amd/comgr/src/hotswap/handle-sop1.cpp
A amd/comgr/src/hotswap/handle-sop2.cpp
M amd/comgr/src/hotswap/opcode-map.cpp
M amd/comgr/src/hotswap/raise-failure.h
M amd/comgr/src/hotswap/setpc-analysis.cpp
M amd/comgr/src/hotswap/setpc-analysis.h
A amd/comgr/test-lit/hotswap-raise/s_and_imm_high_bit_mask.s
A amd/comgr/test-lit/hotswap-raise/s_atomic_dec.s
A amd/comgr/test-lit/hotswap-raise/s_bitcmp.s
A amd/comgr/test-lit/hotswap-raise/s_bitset0_b32.s
A amd/comgr/test-lit/hotswap-raise/s_cmov_b32.s
A amd/comgr/test-lit/hotswap-raise/s_cmp_eq_u64.s
A amd/comgr/test-lit/hotswap-raise/s_fmac_f32.s
A amd/comgr/test-lit/hotswap-raise/s_load_b32_scale_offset.s
A amd/comgr/test-lit/hotswap-raise/s_load_b96_kernarg.s
A amd/comgr/test-lit/hotswap-raise/s_load_u16.s
A amd/comgr/test-lit/hotswap-raise/s_lshr_b64_imm.s
A amd/comgr/test-lit/hotswap-raise/s_minmax_num_f32.s
A amd/comgr/test-lit/hotswap-raise/s_mov_exec_lo_shadow.s
A amd/comgr/test-lit/hotswap-raise/s_mul_hi_i32.s
A amd/comgr/test-lit/hotswap-raise/s_set_pc_i64_dispatch_set.s
A amd/comgr/test-lit/hotswap-raise/s_set_pc_i64_pattern_a.s
A amd/comgr/test-lit/hotswap-raise/s_set_pc_i64_pattern_b.s
A amd/comgr/test-lit/hotswap-raise/s_set_pc_i64_unresolvable.s
A amd/comgr/test-lit/hotswap-raise/s_sub_f32.s
A amd/comgr/test-lit/hotswap-raise/s_sub_nc_u64.s
A amd/comgr/test-lit/hotswap-raise/s_swap_pc_i64_dispatch_set.s
A amd/comgr/test-lit/hotswap-raise/s_swap_pc_i64_pattern_a.s
A amd/comgr/test-lit/hotswap-raise/s_swap_pc_i64_unresolvable.s
A amd/comgr/test-lit/hotswap-raise/s_xor_imm_mask_shadow.s
Log Message:
-----------
[Comgr][hotswap] Add SOP1 + SOP2 handlers + canonical_op_attrs
* handle_sop1.cpp -- scalar unary handlers (s_mov, s_not, s_brev,
s_get_pc_i64, s_setpc_i64 / s_swap_pc_i64 enumerated dispatch over
the resolved targets from setpc-analysis, s_cmov, s_bitset0, ...).
The s_set_pc_i64 / s_swap_pc_i64 cascade is the integrating point
of the setpc-analysis Pattern A / Pattern B / DispatchSet
classifications -- the analysis result drives an `emitEnumeratedDispatch`
`cmp eq + br` cascade rather than an indirect branch.
* handle_sop2.cpp -- scalar binary handlers (s_add, s_sub, s_and,
s_or, s_xor, s_lshr_b64, s_minmax_num_f32, s_mul_hi_i32, ...).
* canonical_op_attrs.{h,cpp} -- per-CanonicalOp attribute table the
raiser dispatch loop consults (e.g. `routesExecThroughStoreExec`
for the Phase 1.5 SPE safety gate).
Co-Authored-By: Tim Gymnich <tim at gymni.ch>
Commit: eb5dcc3ce7df45ed5b90f0fd91c5eadcd655d934
https://github.com/llvm/llvm-project/commit/eb5dcc3ce7df45ed5b90f0fd91c5eadcd655d934
Author: Martin Paul Lücke <martin.luecke at amd.com>
Date: 2026-05-19 (Tue, 19 May 2026)
Changed paths:
M amd/comgr/src/hotswap/CMakeLists.txt
A amd/comgr/src/hotswap/amdgpu-mode-hwreg.h
A amd/comgr/src/hotswap/handle-sopk.cpp
A amd/comgr/test-lit/hotswap-raise/s_addk_i32.s
A amd/comgr/test-lit/hotswap-raise/s_cmpk_eq_i32.s
A amd/comgr/test-lit/hotswap-raise/s_mulk_i32.s
A amd/comgr/test-lit/hotswap-raise/s_wait_dscnt_barrier.s
Log Message:
-----------
[Comgr][hotswap] Add SOPK handler + HWREG/MODE policy
* handle_sopk.cpp -- scalar 16-bit-immediate handlers (s_addk_i32,
s_mulk_i32, s_cmpk_*, s_getreg_b32 / s_setreg_b32 /
s_setreg_imm32_b32). The HWREG paths drive on a
direction-aware policy classifier (`HwregPolicy` -- read vs write
per HWREG id) so loud refusals fire on load-bearing or unknown
registers (FLAT aperture, trap handler, ...) rather than silently
dropping the side effect. MODE-register writes go through the
multi-group replay helper in `amdgpu-mode-hwreg.h`.
* amdgpu_mode_hwreg.h -- MODE-register replay helper
(`isModeReplayMultiGroupWrite`) the SOPK MODE-write path consults
to decide between a single SETREG and a multi-group replay.
Co-Authored-By: Tim Gymnich <tim at gymni.ch>
Commit: 0a4fab910be5e3fb62d7741c84b7d7450ea19918
https://github.com/llvm/llvm-project/commit/0a4fab910be5e3fb62d7741c84b7d7450ea19918
Author: Martin Paul Lücke <martin.luecke at amd.com>
Date: 2026-05-19 (Tue, 19 May 2026)
Changed paths:
M amd/comgr/src/hotswap/CMakeLists.txt
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/docs/sgpr-wave-mask-translation.md
A amd/comgr/src/hotswap/handle-smem.cpp
M amd/comgr/src/hotswap/kernarg-layout.h
M amd/comgr/src/hotswap/opcode-map.cpp
M amd/comgr/src/hotswap/raise-context.h
M amd/comgr/src/hotswap/raise-failure.h
M amd/comgr/src/hotswap/reg-file.h
A amd/comgr/src/hotswap/source-hidden-args.cpp
A amd/comgr/src/hotswap/source-hidden-args.h
M amd/comgr/src/hotswap/user-sgpr-layout.h
A amd/comgr/test-lit/hotswap-raise/smem_kernarg_const_delta.s
A amd/comgr/test-lit/hotswap-raise/smem_modified_kernarg_pair_alias.s
A amd/comgr/test-lit/hotswap-raise/smem_modified_kernarg_pair_base.s
M amd/comgr/test-unit/CMakeLists.txt
A amd/comgr/test-unit/SourceHiddenArgsTest.cpp
Log Message:
-----------
[Comgr][hotswap] Add SMEM handler + source_hidden_args
* handle_smem.cpp — scalar-memory loads keyed off the SMRD format
bit. Covers s_load_dword{,x2,x4,x8,x16}, s_buffer_load_*, and the
gfx1250 s_buffer_atomic_* paths. The
`llvm.amdgcn.implicitarg.ptr` lift on the hidden-arg block is
routed through `source_hidden_args.h` so the gfx1250 -> gfx942
cross-target ABI translation can rewrite hidden-arg byte offsets.
Strict-mode refusals on the hidden-arg lift are gated on
`isStrictMode()` (stubbed `false` until pipeline.h lands).
* source_hidden_args.{h,cpp} — table of hidden_* arg names + their
AMDGPU per-target byte offsets so the cross-target rewriter can
map a source-ISA `gep + load` against the implicit-arg block to
a target-ISA-correct offset.
SMEM lit fixtures land here too; they FileCheck the lifted IR shape
once the dispatcher actually dispatches to handleSMEM (later commit).
Co-Authored-By: Tim Gymnich <tim at gymni.ch>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Commit: 052a10cac59256afca7d98ff6e71a28922535795
https://github.com/llvm/llvm-project/commit/052a10cac59256afca7d98ff6e71a28922535795
Author: Martin Paul Lücke <martin.luecke at amd.com>
Date: 2026-05-19 (Tue, 19 May 2026)
Changed paths:
M amd/comgr/src/hotswap/CMakeLists.txt
M amd/comgr/src/hotswap/docs/sync-translation.md
A amd/comgr/src/hotswap/flat-addr.cpp
A amd/comgr/src/hotswap/flat-addr.h
A amd/comgr/src/hotswap/handle-flat.cpp
M amd/comgr/src/hotswap/handle-smem.cpp
M amd/comgr/src/hotswap/handle-sopp.cpp
M amd/comgr/src/hotswap/raise-context.h
A amd/comgr/test-lit/hotswap-raise/flat_store_short_d16_hi.s
A amd/comgr/test-lit/hotswap-raise/global_load_async_to_lds.s
A amd/comgr/test-lit/hotswap-raise/global_load_async_to_lds_offset.s
A amd/comgr/test-lit/hotswap-raise/global_load_dword_kernarg_saddr.s
A amd/comgr/test-lit/hotswap-raise/global_load_saddr_nonkernarg_after_merge.s
A amd/comgr/test-lit/hotswap-raise/global_load_ushort_saddr.s
A amd/comgr/test-lit/hotswap-raise/global_prefetch_b8.s
A amd/comgr/test-lit/hotswap-raise/global_store_short_d16_hi.s
A amd/comgr/test-lit/hotswap-raise/scratch_private_segment.s
A amd/comgr/test-lit/hotswap-raise/scratch_refusal.s
Log Message:
-----------
[Comgr][hotswap] Add FLAT handler + flat_addr
* handle_flat.cpp — global_load_*, global_store_*, scratch_load_*,
scratch_store_*, flat_load_*, flat_store_*, plus the gfx1250
global_load_async_to_lds family. Routes both the LDS-async
target-intrinsic emit path (gfx1250) and the cross-target
refusal (gfx942) through a single shape classifier.
* flat_addr.{h,cpp} — `FlatAddrComputed` packages the
`vaddr_pair + saddr_pair + offset_imm + addrspace + scope` tuple
every flat handler reads, so the SADDR-vs-VADDR mux logic
doesn't get duplicated per opcode.
global_*/flat_*/scratch_* lit fixtures land here too.
Co-Authored-By: Tim Gymnich <tim at gymni.ch>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Commit: 84a09e9e947ae6719cf89c18e5416a48cec859fa
https://github.com/llvm/llvm-project/commit/84a09e9e947ae6719cf89c18e5416a48cec859fa
Author: Martin Paul Lücke <martin.luecke at amd.com>
Date: 2026-05-19 (Tue, 19 May 2026)
Changed paths:
M amd/comgr/src/hotswap/CMakeLists.txt
M amd/comgr/src/hotswap/canonical-op.h
A amd/comgr/src/hotswap/docs/buffer-store-lowering.md
M amd/comgr/src/hotswap/docs/sync-translation.md
M amd/comgr/src/hotswap/handle-flat.cpp
A amd/comgr/src/hotswap/handle-mubuf.cpp
A amd/comgr/src/hotswap/mubuf-addr.cpp
A amd/comgr/src/hotswap/mubuf-addr.h
M amd/comgr/src/hotswap/opcode-map.cpp
A amd/comgr/test-lit/hotswap-raise/buffer_atomic_add_u32.s
A amd/comgr/test-lit/hotswap-raise/buffer_atomic_cmpswap_b32.s
A amd/comgr/test-lit/hotswap-raise/buffer_atomic_cmpswap_b32_nortn.s
A amd/comgr/test-lit/hotswap-raise/buffer_atomic_swap_b32.s
A amd/comgr/test-lit/hotswap-raise/buffer_atomic_swap_b32_nortn.s
A amd/comgr/test-lit/hotswap-raise/buffer_load_d16_u8.s
A amd/comgr/test-lit/hotswap-raise/buffer_store_no_scratch_alloca.s
A amd/comgr/test-lit/hotswap-raise/buffer_store_short_srd_words.s
A amd/comgr/test-lit/hotswap-raise/buffer_store_wave_native_oob_mask.s
Log Message:
-----------
[Comgr][hotswap] Add MUBUF handler + mubuf_addr + buffer-store-lowering doc
* handle_mubuf.cpp — typed buffer ops (buffer_load_*,
buffer_store_*, buffer_atomic_*) routed through the
`addrspace(8)` raw-pointer intrinsic family on cross-widening
paths and through the legacy <4 x i32> resource path on
same-wave / same-target paths.
* mubuf_addr.{h,cpp} — descriptor-tuple packaging analogous to
`flat_addr.{h,cpp}` but with the v_buffer_descriptor /
soffset / scope pieces specific to MUBUF.
* docs/buffer-store-lowering.md — companion writeup capturing the
wave32->wave64 address-rebroadcast contract that
handle_mubuf.cpp's storeWaveOOBMask helper relies on.
buffer_* lit fixtures land here.
Co-Authored-By: Tim Gymnich <tim at gymni.ch>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Commit: ec03ce726c949a67ce401eccf9418bc5595c243b
https://github.com/llvm/llvm-project/commit/ec03ce726c949a67ce401eccf9418bc5595c243b
Author: Martin Paul Lücke <martin.luecke at amd.com>
Date: 2026-05-19 (Tue, 19 May 2026)
Changed paths:
M amd/comgr/src/hotswap/CMakeLists.txt
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/decoded-inst.h
A amd/comgr/src/hotswap/handle-ds.cpp
M amd/comgr/src/hotswap/handle-flat.cpp
M amd/comgr/src/hotswap/opcode-map.cpp
A amd/comgr/test-lit/hotswap-raise/ds_bpermute_b32.s
A amd/comgr/test-lit/hotswap-raise/ds_bpermute_b32_wave32_rebase.s
A amd/comgr/test-lit/hotswap-raise/ds_load_2addr_b32.s
A amd/comgr/test-lit/hotswap-raise/ds_load_b96.s
A amd/comgr/test-lit/hotswap-raise/ds_load_tr8_b64.s
A amd/comgr/test-lit/hotswap-raise/ds_store_b16_d16_hi.s
Log Message:
-----------
[Comgr][hotswap] Add DS handler
* handle_ds.cpp — LDS load/store/atomic family. Covers ds_read_b32 /
ds_write_b32 (and b64/b96/b128 variants), ds_add_*/ds_min_*/
ds_max_*/ds_inc_u32/ds_dec_u32 atomics, ds_bpermute_b32 (the
workhorse cross-widening broadcast), ds_swizzle_b32 family.
LDS addresses lower to addrspace(3) GEPs against the
`[N x i8]` block address space the AMDGPU backend expects.
ds_* lit fixtures land here.
Co-Authored-By: Tim Gymnich <tim at gymni.ch>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Commit: 2c8b166329720398ffb4712c7a74ac9a4fbc6a27
https://github.com/llvm/llvm-project/commit/2c8b166329720398ffb4712c7a74ac9a4fbc6a27
Author: Martin Paul Lücke <martin.luecke at amd.com>
Date: 2026-05-19 (Tue, 19 May 2026)
Changed paths:
M amd/comgr/src/hotswap/CMakeLists.txt
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/decode.cpp
M amd/comgr/src/hotswap/docs/sgpr-wave-mask-translation.md
M amd/comgr/src/hotswap/handle-sop1.cpp
A amd/comgr/src/hotswap/handle-valu-cross-lane.cpp
A amd/comgr/src/hotswap/handle-valu-internal.h
A amd/comgr/src/hotswap/handle-valu-output-mods.cpp
A amd/comgr/src/hotswap/handle-valu-output-mods.h
A amd/comgr/src/hotswap/handle-valu-small-ops.cpp
A amd/comgr/src/hotswap/handle-valu-vcmp.cpp
A amd/comgr/src/hotswap/handle-valu.cpp
M amd/comgr/src/hotswap/isa-profile.h
M amd/comgr/src/hotswap/opcode-map.cpp
M amd/comgr/src/hotswap/opcode-map.h
M amd/comgr/src/hotswap/raise-context.h
M amd/comgr/src/hotswap/reg-file.h
M amd/comgr/src/hotswap/wave-projection.cpp
M amd/comgr/src/hotswap/wave-projection.h
A amd/comgr/test-lit/hotswap-raise/v_add_co_u32_sgpr_carry.s
A amd/comgr/test-lit/hotswap-raise/v_add_min_u32.s
A amd/comgr/test-lit/hotswap-raise/v_add_nc_u16.s
A amd/comgr/test-lit/hotswap-raise/v_alignbit_b32.s
A amd/comgr/test-lit/hotswap-raise/v_bfi_b32.s
A amd/comgr/test-lit/hotswap-raise/v_cmp_class_f32.s
A amd/comgr/test-lit/hotswap-raise/v_cmp_cndmask_sgpr.s
A amd/comgr/test-lit/hotswap-raise/v_cmp_cndmask_sgpr_class.s
A amd/comgr/test-lit/hotswap-raise/v_cmp_cndmask_sgpr_scalar_clobber.s
A amd/comgr/test-lit/hotswap-raise/v_cmp_i16_trunc.s
A amd/comgr/test-lit/hotswap-raise/v_cmp_u16_trunc.s
A amd/comgr/test-lit/hotswap-raise/v_cmpx_ballot.s
A amd/comgr/test-lit/hotswap-raise/v_div_scale_f32_literal_numer.s
A amd/comgr/test-lit/hotswap-raise/v_dual_cndmask_b32_sgpr_cond.s
A amd/comgr/test-lit/hotswap-raise/v_exp_log_f32.s
A amd/comgr/test-lit/hotswap-raise/v_mad_i32_i24.s
A amd/comgr/test-lit/hotswap-raise/v_mad_nc_i64_i32.s
A amd/comgr/test-lit/hotswap-raise/v_mad_nc_u64_u32.s
A amd/comgr/test-lit/hotswap-raise/v_max3_u32.s
A amd/comgr/test-lit/hotswap-raise/v_maximum3_f32.s
A amd/comgr/test-lit/hotswap-raise/v_med3_i32.s
A amd/comgr/test-lit/hotswap-raise/v_min3_u32.s
A amd/comgr/test-lit/hotswap-raise/v_minimum3_f32.s
A amd/comgr/test-lit/hotswap-raise/v_minmax_num_f32.s
A amd/comgr/test-lit/hotswap-raise/v_mul_u64.s
A amd/comgr/test-lit/hotswap-raise/v_permlane32_swap_b32.s
A amd/comgr/test-lit/hotswap-raise/v_rcp_f64.s
A amd/comgr/test-lit/hotswap-raise/v_s_exp_f32.s
A amd/comgr/test-lit/hotswap-raise/v_s_trans_f32.s
A amd/comgr/test-lit/hotswap-raise/v_sub_nc_u64.s
A amd/comgr/test-lit/hotswap-raise/v_xad_u32.s
A amd/comgr/test-lit/hotswap-raise/v_xor3_b32.s
Log Message:
-----------
[Comgr][hotswap] Add VALU handlers (base + cross-lane + small-ops + VCMP)
Base VALU instruction lifting on top of the existing scalar/memory
handlers. The matrix-adjacent specialisations (VOP3P packed, WMMA
lowering, MXFP4 dequant) land in the following commit.
* handle_valu.cpp -- main VALU dispatcher (VOP1/2/3, v_cndmask,
v_madmk/v_madak, the larger VOP3 forms, etc.).
* handle_valu_internal.h -- private declarations shared across the
handle_valu_* TUs.
* handle_valu_output_mods.{h,cpp} -- clamp / omod output-modifier
helpers consumed by the base and small-ops handlers.
* handle_valu_small_ops.cpp -- small VOP3 forms: v_max3, v_med3,
v_minimum3, v_maximum3, v_minmax_num, v_xor3 etc.
* handle_valu_cross_lane.cpp -- cross-lane primitives (readlane,
writelane, permlane, permlane*_swap) -- their classifier rewrite
pass lands later.
* handle_valu_vcmp.cpp -- VOPC compares (v_cmp_*, v_cmpx_*) with
the SGPR-wave-mask shadow contract documented in
`docs/sgpr-wave-mask-translation.md`.
Lit fixtures cover the base VALU forms (v_add_*, v_mad_*, v_min/max3,
v_med3, v_cmp_*, v_cmpx_*, v_alignbit, v_bfi, v_xor3, v_xad,
v_permlane32_swap, ...). The dispatcher arm wiring lands later in the
chain so most fixtures only pass once the production raiser dispatcher
commit lands; they're staged here next to the handler code so future
re-bisects keep handlers and expectations together.
Co-Authored-By: Tim Gymnich <tim at gymni.ch>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Commit: fd7627ba497da8c7cfca5ed29dfe9565b7f7ae2a
https://github.com/llvm/llvm-project/commit/fd7627ba497da8c7cfca5ed29dfe9565b7f7ae2a
Author: Martin Paul Lücke <martin.luecke at amd.com>
Date: 2026-05-19 (Tue, 19 May 2026)
Changed paths:
M amd/comgr/src/hotswap/CMakeLists.txt
A amd/comgr/src/hotswap/handle-valu-vop3p.cpp
A amd/comgr/src/hotswap/mxfp4-dequant.cpp
A amd/comgr/src/hotswap/mxfp4-dequant.h
A amd/comgr/src/hotswap/wmma-lowering.cpp
A amd/comgr/src/hotswap/wmma-lowering.h
A amd/comgr/test-lit/hotswap-raise/v_cvt_scale_pk8_bf16_fp4.s
A amd/comgr/test-lit/hotswap-raise/v_cvt_scalef32_pk8_fp8_f32.s
A amd/comgr/test-lit/hotswap-raise/v_dot4_i32_iu8.s
A amd/comgr/test-lit/hotswap-raise/v_fma_mix_f32_bf16.s
A amd/comgr/test-lit/hotswap-raise/v_fma_mixlo_bf16.s
A amd/comgr/test-lit/hotswap-raise/v_pk_add_f32_lit.s
A amd/comgr/test-lit/hotswap-raise/v_pk_fma_f16.s
A amd/comgr/test-lit/hotswap-raise/v_pk_int_b16.s
M amd/comgr/test-unit/CMakeLists.txt
A amd/comgr/test-unit/MXFP4DequantTest.cpp
Log Message:
-----------
[Comgr][hotswap] Add VOP3P handler + WMMA lowering + MXFP4 dequant
Matrix-adjacent VALU specialisations layered on top of the base VALU
dispatcher from the previous commit.
* handle_valu_vop3p.cpp -- packed VOP3P forms (v_pk_*, v_dot4_*,
v_fma_mix_*, v_cvt_scalef32_pk8_*). Uses wmma_lowering for the
cross-target WMMA -> MFMA / scaled-WMMA -> MFMA emission paths.
* wmma_lowering.{h,cpp} -- WMMA -> MFMA and scaled-WMMA -> MFMA
cross-target lowering helpers. Consulted by both the VOP3P
handler and the main VALU dispatcher's cndmask/MIX shape paths.
* mxfp4_dequant.{h,cpp} -- standalone MXFP4 (FP4 with shared
exponent) dequantisation helpers. Not directly consumed by the
raiser today (the inline algebra in `handle_valu.cpp` reproduces
the algorithm); kept as a separate TU so a gtest unit suite can
pin the bit-exact contract independently.
Lit fixtures cover the VOP3P + scaled-WMMA paths (v_dot4_i32_iu8,
v_fma_mix_*, v_pk_*, v_cvt_scale*).
Co-Authored-By: Tim Gymnich <tim at gymni.ch>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Commit: 96e5c520b9935ad9f2019b49280f822f112ddae5
https://github.com/llvm/llvm-project/commit/96e5c520b9935ad9f2019b49280f822f112ddae5
Author: Martin Paul Lücke <martin.luecke at amd.com>
Date: 2026-05-19 (Tue, 19 May 2026)
Changed paths:
M amd/comgr/src/hotswap/CMakeLists.txt
A amd/comgr/src/hotswap/docs/matrix-translation.md
A amd/comgr/src/hotswap/handle-mfma.cpp
M amd/comgr/src/hotswap/handle-valu-vop3p.cpp
M amd/comgr/src/hotswap/handlers.h
M amd/comgr/src/hotswap/wmma-lowering.cpp
Log Message:
-----------
[Comgr][hotswap] Add MFMA handler + matrix-translation doc
Brings handle_mfma.cpp (and the small amdgpu_formats.h shim it
needs for SIInstrFlags) so MFMA opcodes can be lifted into
appropriate intrinsics. Pairs with the docs/matrix-translation.md
note describing the gfx950 MFMA translation rules used by both
this handler and the wmma_lowering.cpp helpers from the previous
commit.
Co-Authored-By: Tim Gymnich <tim at gymni.ch>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Commit: cfb996dde85932d1ac14a7920fcb7364dda316b8
https://github.com/llvm/llvm-project/commit/cfb996dde85932d1ac14a7920fcb7364dda316b8
Author: Martin Paul Lücke <martin.luecke at amd.com>
Date: 2026-05-19 (Tue, 19 May 2026)
Changed paths:
M amd/comgr/src/hotswap/CMakeLists.txt
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/decoded-inst.h
M amd/comgr/src/hotswap/docs/matrix-translation.md
A amd/comgr/src/hotswap/handle-vopd.cpp
A amd/comgr/test-lit/hotswap-raise/vopd_bitop2_bitop3.s
A amd/comgr/test-lit/hotswap-raise/vopd_extra_subops.s
A amd/comgr/test-lit/hotswap-raise/vopd_f64.s
A amd/comgr/test-lit/hotswap-raise/vopd_vgpr_msb.s
Log Message:
-----------
[Comgr][hotswap] Add VOPD handler
Adds the VOPD (dual-issue) handler that splits a fused VOPD encoding
back into its two component VALU operations and re-dispatches each
through the existing single-op lifting path.
Lit fixtures cover bitop2/3 dual subops, extra-subop forms, the
f64 dual encodings, and the vgpr_msb wide-register split.
Co-Authored-By: Tim Gymnich <tim at gymni.ch>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Commit: 34ea05e7297ef59e4c82cd8d9ed732b39c2f4f1f
https://github.com/llvm/llvm-project/commit/34ea05e7297ef59e4c82cd8d9ed732b39c2f4f1f
Author: Martin Paul Lücke <martin.luecke at amd.com>
Date: 2026-05-19 (Tue, 19 May 2026)
Changed paths:
M amd/comgr/src/hotswap/CMakeLists.txt
A amd/comgr/src/hotswap/cmake/EmbedBinary.cmake
M amd/comgr/src/hotswap/handle-sopp.cpp
A amd/comgr/src/hotswap/handle-vimage.cpp
A amd/comgr/src/hotswap/runtime/tdm.hip
A amd/comgr/src/hotswap/tdm-runtime.cpp
A amd/comgr/src/hotswap/tdm-runtime.h
A amd/comgr/test-lit/hotswap-raise/tensor_load_to_lds.s
A amd/comgr/test-lit/hotswap-raise/tensor_store_from_lds.s
M amd/comgr/test-lit/lit.cfg.py
M amd/comgr/test-lit/lit.site.cfg.py.in
Log Message:
-----------
[Comgr][hotswap] Add VIMAGE handler + TDM emulation runtime
Adds the VIMAGE TENSOR_LOAD_TO_LDS / TENSOR_STORE_FROM_LDS lifting
plus the cross-target TDM emulation runtime: a small HIP source
(runtime/tdm.hip) compiled to gfx942 bitcode by hipcc and embedded
into the library via cmake/EmbedBinary.cmake. The transpiler links
the embedded bitcode into raised IR modules that need the
gfx1250->gfx942 TENSOR shim.
When hipcc isn't available the embedded blob is empty and
tdmRuntimeAvailable() returns false, so the cross-target VIMAGE
path falls back to its existing loud refusal — preserving the
no-hipcc build behaviour exactly.
Lit: tensor_load_to_lds.s, tensor_store_from_lds.s.
Co-Authored-By: Tim Gymnich <tim at gymni.ch>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Commit: 14350c6c9bdb8e65f3bd527f2273f3a33a988884
https://github.com/llvm/llvm-project/commit/14350c6c9bdb8e65f3bd527f2273f3a33a988884
Author: Martin Paul Lücke <martin.luecke at amd.com>
Date: 2026-05-19 (Tue, 19 May 2026)
Changed paths:
M amd/comgr/src/hotswap/CMakeLists.txt
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/docs/sgpr-wave-mask-translation.md
A amd/comgr/src/hotswap/docs/wave-size-translation.md
M amd/comgr/src/hotswap/handle-valu-vcmp.cpp
M amd/comgr/src/hotswap/raise-failure.h
M amd/comgr/src/hotswap/wave-projection.cpp
A amd/comgr/src/hotswap/wave-size-obstruction.cpp
A amd/comgr/src/hotswap/wave-size-obstruction.h
Log Message:
-----------
[Comgr][hotswap] Add wave-size obstruction classifier
Pre-raise pass that scans the disassembled stream for instructions
that semantically depend on the source kernel's wave size (32 vs 64)
and would be miscompiled if naively re-emitted at a different wave
size. The production raiser dispatcher in a follow-up commit will
consult this classifier before deciding whether a given kernel can
be re-targeted.
Lands as standalone module here (no callers yet); pairs with the
docs/wave-size-translation.md and docs/sgpr-wave-mask-translation.md
notes that specify the obstruction taxonomy and the
SGPR→wave-mask abi translation rules.
Co-Authored-By: Tim Gymnich <tim at gymni.ch>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Commit: a22e46260d7b00f53d269ff8a26f3bee70de0498
https://github.com/llvm/llvm-project/commit/a22e46260d7b00f53d269ff8a26f3bee70de0498
Author: Martin Paul Lücke <martin.luecke at amd.com>
Date: 2026-05-19 (Tue, 19 May 2026)
Changed paths:
M amd/comgr/src/hotswap/CMakeLists.txt
M amd/comgr/src/hotswap/docs/wave-size-translation.md
A amd/comgr/src/hotswap/rewrite-cross-lane-divergent.cpp
A amd/comgr/src/hotswap/rewrite-cross-lane-divergent.h
Log Message:
-----------
[Comgr][hotswap] Add cross-lane divergent rewrite pass
Post-raise pass that rewrites cross-lane DPP / permlane / readlane
operations whose lane-mask is divergent into the equivalent ds_swizzle
or wave-mask form for the target architecture. The production raiser
dispatcher in a follow-up commit will run this pass after the IR is
lifted but before the codegen handoff.
Lands as a standalone module here (no callers yet) so the diff stays
small and focused; the wiring will land alongside the dispatcher.
Co-Authored-By: Tim Gymnich <tim at gymni.ch>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Commit: 8c19f408d8bdaba1d710d05709d7dd3ec7f27a81
https://github.com/llvm/llvm-project/commit/8c19f408d8bdaba1d710d05709d7dd3ec7f27a81
Author: Martin Paul Lücke <martin.luecke at amd.com>
Date: 2026-05-19 (Tue, 19 May 2026)
Changed paths:
M amd/comgr/src/hotswap/CMakeLists.txt
A amd/comgr/src/hotswap/c5-predicate-chain-classifier.cpp
A amd/comgr/src/hotswap/c5-predicate-chain-classifier.h
A amd/comgr/src/hotswap/docs/modrep-predicate-chain.md
M amd/comgr/src/hotswap/wave-size-obstruction.cpp
A amd/comgr/test-unit/C5PredicateChainTest.cpp
M amd/comgr/test-unit/CMakeLists.txt
Log Message:
-----------
[Comgr][hotswap] Add C5 predicate-chain classifier
Adds the C5 ("modrep") predicate-chain classifier: a static analysis
over the lifted IR that recognises chains of cross-lane predicates
where the modulating-rep pattern can be proven equivalent to a
simpler form on the target ISA. Companion docs/modrep-predicate-chain.md
spells out the recognition rules and the suppression criteria.
Lands as a standalone module here; the production raiser dispatcher
in a follow-up commit consumes the classifier output to suppress
predicate-chain code that the target wave size handles natively.
Co-Authored-By: Tim Gymnich <tim at gymni.ch>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Commit: c89072cebf073de3972b211e36c62e4062dad1ef
https://github.com/llvm/llvm-project/commit/c89072cebf073de3972b211e36c62e4062dad1ef
Author: Martin Paul Lücke <martin.luecke at amd.com>
Date: 2026-05-19 (Tue, 19 May 2026)
Changed paths:
M amd/comgr/src/hotswap/CMakeLists.txt
M amd/comgr/src/hotswap/README.md
M amd/comgr/src/hotswap/canonical-op-attrs.cpp
M amd/comgr/src/hotswap/canonical-op-attrs.h
A amd/comgr/src/hotswap/docs/gfx1250-on-gfx950-analysis.md
A amd/comgr/src/hotswap/docs/learnings.md
A amd/comgr/src/hotswap/docs/timing.md
A amd/comgr/src/hotswap/pipeline.cpp
A amd/comgr/src/hotswap/pipeline.h
M amd/comgr/src/hotswap/raiser.cpp
M amd/comgr/src/hotswap/raiser.h
A amd/comgr/src/hotswap/translation-cache.cpp
A amd/comgr/src/hotswap/translation-cache.h
M amd/comgr/test-lit/comgr-sources/raise_cli.cpp
M amd/comgr/test-unit/CMakeLists.txt
A amd/comgr/test-unit/TranslationCacheTest.cpp
Log Message:
-----------
[Comgr][hotswap] Wire production raiser dispatcher + pipeline + raise_cli HSACO mode
Replaces the bare-bones raiser scaffolding with the production
dispatcher: takes the .text bytes + decoded kernel descriptor and
walks each MC instruction through the per-family handlers
(SOPP/SOPC/SOP1/SOP2/SOPK/SMEM/FLAT/MUBUF/DS/VALU/MFMA/VOPD/VIMAGE),
running the wave-size obstruction classifier, the cross-lane
divergent rewrite, and the C5 predicate-chain classifier as
post-raise passes.
Adds the higher-level `pipeline.cpp` driver that ties raise -> llc ->
ld.lld together, plus `translation_cache.cpp` for keyed reuse of
already-translated kernels. Extends `raise_cli.cpp` to support the
full --write-hsaco mode in addition to --emit-ir, so the lit
harness can exercise both halves of the pipeline.
Re-registers `getHandlerValuVcmpAttrs()` in the canonical_op_attrs
table so the V_CMPX → routesExecThroughStoreExec audit at startup
sees the V_CMP/V_CMPX entries.
Brings the matching docs/ notes (`learnings.md`, `timing.md`,
`gfx1250-on-gfx950-analysis.md`) and the README so reviewers can
follow the dispatch + post-raise design in one commit.
After this commit `check-comgr` lit passes 102/105 hotswap-raise
fixtures (2 unsupported, 1 needs `llc` on PATH for the full
HSACO-write pipeline).
Co-Authored-By: Tim Gymnich <tim at gymni.ch>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Commit: 79abce7409a27a840b05321aae8ffbd3f7f6570a
https://github.com/llvm/llvm-project/commit/79abce7409a27a840b05321aae8ffbd3f7f6570a
Author: Martin Paul Lücke <martin.luecke at amd.com>
Date: 2026-05-19 (Tue, 19 May 2026)
Changed paths:
M amd/comgr/CMakeLists.txt
M amd/comgr/include/amd_comgr.h.in
A amd/comgr/src/comgr-hotswap-transpile.cpp
M amd/comgr/src/exportmap.in
M amd/comgr/src/hotswap/CMakeLists.txt
Log Message:
-----------
[Comgr][hotswap] Add amd_comgr_hotswap_transpile public API
Adds the public C entry points that expose the hotswap transpiler
to amd_comgr clients:
amd_comgr_hotswap_transpile (lift -> codegen -> link)
amd_comgr_hotswap_transpile_with_options (cache + structured result)
Implementation lives in src/comgr-hotswap-transpile.cpp and is
gated on COMGR_ENABLE_HOTSWAP_TRANSPILE; when disabled the entry
points are simply not provided. The hotswap subdirectory is now
added unconditionally (so the OBJECT lib is built for the
test-unit suite); only the linkage into amd_comgr and the
public-API TU stay behind the option.
Updates the export map and amd_comgr.h.in with the new symbols
and pulls the small set of include-path additions in the hotswap
CMake (parent build/include for amd_comgr.h, plus the
LLVM_TOOLS_DIR compile def used by the pipeline driver).
Co-Authored-By: Tim Gymnich <tim at gymni.ch>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Commit: 80efe9b9c89ed3777e63cb74f48b724bc5bfcc29
https://github.com/llvm/llvm-project/commit/80efe9b9c89ed3777e63cb74f48b724bc5bfcc29
Author: Martin Paul Lücke <martin.luecke at amd.com>
Date: 2026-05-19 (Tue, 19 May 2026)
Changed paths:
A amd/comgr/test-lit/hotswap-raise/README.md
A amd/comgr/test-lit/hotswap-raise/c1_lane_id_leak.s
A amd/comgr/test-lit/hotswap-raise/c1_ttmp_wave_id_lift.s
A amd/comgr/test-lit/hotswap-raise/c1_wave_id_lift_scalarized.s
A amd/comgr/test-lit/hotswap-raise/c2_dpp_fi_refuse.s
A amd/comgr/test-lit/hotswap-raise/c2_dpp_quad_perm.s
A amd/comgr/test-lit/hotswap-raise/c2_dpp_row_ror_refuse.s
A amd/comgr/test-lit/hotswap-raise/c2_dpp_row_shr_bound.s
A amd/comgr/test-lit/hotswap-raise/c2_dpp_row_xmask.s
A amd/comgr/test-lit/hotswap-raise/c2_dpp_row_xmask_boundary.s
A amd/comgr/test-lit/hotswap-raise/c2_dpp_row_xmask_partial_mask.s
A amd/comgr/test-lit/hotswap-raise/c2_ds_swizzle.s
A amd/comgr/test-lit/hotswap-raise/c2_ds_swizzle_fft_reserved.s
A amd/comgr/test-lit/hotswap-raise/c2_ds_swizzle_unsafe.s
A amd/comgr/test-lit/hotswap-raise/c2_permlane16.s
A amd/comgr/test-lit/hotswap-raise/c2_permlane_swap.s
A amd/comgr/test-lit/hotswap-raise/c3_atomic_cas.s
A amd/comgr/test-lit/hotswap-raise/c4_lane_dep_cmpx.s
A amd/comgr/test-lit/hotswap-raise/c4_mbcnt_unrelated_exec.s
A amd/comgr/test-lit/hotswap-raise/c5_predicate_chain_masked.s
A amd/comgr/test-lit/hotswap-raise/c5_predicate_chain_phantom_lane.s
A amd/comgr/test-lit/hotswap-raise/c5_predicate_chain_tid.s
A amd/comgr/test-lit/hotswap-raise/cross_wave_warn.s
A amd/comgr/test-lit/hotswap-raise/decoder_madmk_v_fmamk_f32.s
A amd/comgr/test-lit/hotswap-raise/default_mode_kernel_offsets.s
A amd/comgr/test-lit/hotswap-raise/divergent_vgpr_ir.s
A amd/comgr/test-lit/hotswap-raise/gfx1200_hidden_group_size_dispatch.s
A amd/comgr/test-lit/hotswap-raise/mbcnt_lo_source_wave.s
A amd/comgr/test-lit/hotswap-raise/phantom_lane_modrep_fallback.s
A amd/comgr/test-lit/hotswap-raise/readlane_divergent_rewrite.s
A amd/comgr/test-lit/hotswap-raise/s_cvt_f16_f32.s
A amd/comgr/test-lit/hotswap-raise/s_cvt_hi_f32_f16.s
A amd/comgr/test-lit/hotswap-raise/s_scalar_f32_rounding.s
A amd/comgr/test-lit/hotswap-raise/scalar_exec_writers.s
A amd/comgr/test-lit/hotswap-raise/source_hidden_arg_unsupported.s
A amd/comgr/test-lit/hotswap-raise/ttmp7_workgroup_id_yz_init.s
A amd/comgr/test-lit/hotswap-raise/unreachable_fallthrough_after_branch.s
A amd/comgr/test-lit/hotswap-raise/user_sgpr_count_32_gfx125.s
A amd/comgr/test-lit/hotswap-raise/v_add_sub_nc_i16.s
A amd/comgr/test-lit/hotswap-raise/v_add_sub_nc_u16_clamp.s
A amd/comgr/test-lit/hotswap-raise/v_cvt_f32_f64.s
A amd/comgr/test-lit/hotswap-raise/v_cvt_f64_f32.s
A amd/comgr/test-lit/hotswap-raise/v_ieee_minimummaximum_f16.s
A amd/comgr/test-lit/hotswap-raise/v_maximumminimum_f32.s
A amd/comgr/test-lit/hotswap-raise/v_maximumminimum_modifier_refuse.s
A amd/comgr/test-lit/hotswap-raise/v_maxmin_num_f32.s
A amd/comgr/test-lit/hotswap-raise/v_minimummaximum_f32.s
A amd/comgr/test-lit/hotswap-raise/v_sub_nc_u16.s
A amd/comgr/test-lit/hotswap-raise/wmma_f32_16x16x32_bf16.s
A amd/comgr/test-lit/hotswap-raise/wmma_f32_16x16x4_f32.s
A amd/comgr/test-lit/hotswap-raise/wmma_f32_16x16x64_fp8_fp8.s
A amd/comgr/test-lit/hotswap-raise/wmma_i32_16x16x64_iu8.s
A amd/comgr/test-lit/hotswap-raise/wmma_phantom_lane_f16_chain.s
A amd/comgr/test-lit/hotswap-raise/wmma_phantom_lane_refuse.s
A amd/comgr/test-lit/hotswap-raise/wmma_scale_f32_16x16x128_f8f6f4.s
A amd/comgr/test-lit/hotswap-raise/writelane_divergent_rewrite.s
A amd/comgr/test-lit/hotswap-raise/writelane_sgpr_forced_use.s
A amd/comgr/test-lit/hotswap-raise/writelane_threadloop_barrier_refuse.s
A amd/comgr/test-lit/hotswap-raise/writelane_uniform_noop.s
M amd/comgr/test-unit/RaiserScaffoldingTest.cpp
Log Message:
-----------
[Comgr][hotswap] Add scenario lit fixtures (C1-C5, WMMA, writelane, ...)
Brings the rest of the hotswap-raise/ lit corpus that pins each
classification scenario the production raiser handles:
- C1 (lane-id leaks, ttmp wave-id lift) — 3 fixtures
- C2 (DPP, ds_swizzle, permlane*) — 7 fixtures
- C3 (atomic compare-and-swap) — 1 fixture
- C4 (lane-dependent CMPX, mbcnt EXEC interactions) — 2 fixtures
- C5 (predicate-chain TID / phantom-lane / masked) — 3 fixtures
- WMMA (f32/f16/iu8/scaled f8f6f4/phantom-lane refuse) — 7 fixtures
- writelane / readlane (divergent rewrite, uniform-noop, ...) — 5 fixtures
- decoder edge cases (madmk -> v_fmamk_f32, ttmp7 init, etc.)
- scenario fixtures for hidden-arg unsupported, cross-wave warnings,
user_sgpr_count_32 on gfx125x, default-mode kernel offsets,
mbcnt source-wave, scalar EXEC writers, divergent_vgpr_ir,
unreachable fallthrough, gfx1200 hidden group-size dispatch.
Plus the hotswap-raise/README.md describing the fixture taxonomy.
With these fixtures landed, `check-comgr` reports
142/145 passing (2 unsupported, 1 environmental — needs llc on PATH).
Co-Authored-By: Tim Gymnich <tim at gymni.ch>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Commit: 84a3eb85485c8c6c984625e1d39781b943a4260d
https://github.com/llvm/llvm-project/commit/84a3eb85485c8c6c984625e1d39781b943a4260d
Author: Martin Paul Lücke <martin.luecke at amd.com>
Date: 2026-05-19 (Tue, 19 May 2026)
Changed paths:
M amd/comgr/src/hotswap/CMakeLists.txt
M amd/comgr/src/hotswap/handle-smem.cpp
M amd/comgr/src/hotswap/handle-sopk.cpp
M amd/comgr/test-lit/CMakeLists.txt
A amd/comgr/test-lit/comgr-sources/hotswap-transpile.c
A amd/comgr/test-lit/hotswap-transpile.c
A amd/comgr/test-lit/vecadd_gfx950.co
Log Message:
-----------
[Comgr][hotswap] Add black-box hotswap-transpile lit + drop pipeline.h stubs
Adds the end-to-end black-box lit fixture for the public
`amd_comgr_hotswap_transpile` entry point:
- test-lit/hotswap-transpile.c — the lit harness
- test-lit/comgr-sources/hotswap-transpile.c — the test driver
- test-lit/vecadd_gfx950.co — pre-built input code object
- test-lit/CMakeLists.txt — wires the new comgr-source binary
Drops the temporary `isStrictMode()` stubs in handle_smem.cpp and
handle_sopk.cpp now that the real `pipeline.h` declaration is in
the tree (landed earlier in the production raiser commit).
Final state: lit reports 179/192 passing, 12 unsupported, 1 needs
llc on PATH; ninja test-unit is fully green.
Co-Authored-By: Tim Gymnich <tim at gymni.ch>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Commit: 202c9b8121a012f9791a3b9843b14e33ce482aa9
https://github.com/llvm/llvm-project/commit/202c9b8121a012f9791a3b9843b14e33ce482aa9
Author: Martin Lücke <mluecke at amd.com>
Date: 2026-05-19 (Tue, 19 May 2026)
Changed paths:
M amd/comgr/src/hotswap/handle-sop2.cpp
A amd/comgr/test-lit/hotswap-raise/s_andn2_mask_shadow.s
Log Message:
-----------
[Comgr][hotswap] Preserve mask shadows through negated SOP2 ops
Commit: 24876ac189ebf0a22475a517b5b0be14d66e7c7d
https://github.com/llvm/llvm-project/commit/24876ac189ebf0a22475a517b5b0be14d66e7c7d
Author: Martin Lücke <mluecke at amd.com>
Date: 2026-05-19 (Tue, 19 May 2026)
Changed paths:
M amd/comgr/src/hotswap/canonical-op.cpp
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/handle-valu-vop3p.cpp
M amd/comgr/src/hotswap/opcode-map.cpp
A amd/comgr/test-lit/hotswap-raise/v_pk_add_bf16.s
A amd/comgr/test-lit/hotswap-raise/v_pk_bf16_arith_clamp_refuse.s
A amd/comgr/test-lit/hotswap-raise/v_pk_bf16_siblings.s
A amd/comgr/test-lit/hotswap-raise/v_pk_fma_bf16.s
Log Message:
-----------
[Comgr][hotswap] Lift packed BF16 VOP3P arithmetic
Commit: 42abe4511842c6ec7e029c99febc2ff5ca5e4515
https://github.com/llvm/llvm-project/commit/42abe4511842c6ec7e029c99febc2ff5ca5e4515
Author: Martin Lücke <mluecke at amd.com>
Date: 2026-05-19 (Tue, 19 May 2026)
Changed paths:
M amd/comgr/src/hotswap/canonical-op.cpp
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/handle-valu-small-ops.cpp
M amd/comgr/src/hotswap/opcode-map.cpp
M amd/comgr/src/hotswap/rewrite-cross-lane-divergent.cpp
A amd/comgr/test-lit/hotswap-raise/f32_small_ops_output_mod_refuse.s
A amd/comgr/test-lit/hotswap-raise/v_rndne_f32.s
A amd/comgr/test-lit/hotswap-raise/v_rndne_f32_dpp_propagator.s
Log Message:
-----------
[Comgr][hotswap] Support v_rndne_f32 and guard F32 output modifiers
Commit: 69aceba33f69710fe9741677141161567ca222fb
https://github.com/llvm/llvm-project/commit/69aceba33f69710fe9741677141161567ca222fb
Author: Martin Lücke <mluecke at amd.com>
Date: 2026-05-19 (Tue, 19 May 2026)
Changed paths:
M amd/comgr/src/hotswap/canonical-op.cpp
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/decode.cpp
M amd/comgr/src/hotswap/handle-sop2.cpp
M amd/comgr/src/hotswap/opcode-map.cpp
A amd/comgr/test-lit/hotswap-raise/s_fmaak_fmamk_f32.s
Log Message:
-----------
[Comgr][hotswap] Support scalar fmaak/fmamk f32
Commit: 94d5b99598d30410ee2131a3d2f8d81b31d1ee91
https://github.com/llvm/llvm-project/commit/94d5b99598d30410ee2131a3d2f8d81b31d1ee91
Author: Martin Lücke <mluecke at amd.com>
Date: 2026-05-19 (Tue, 19 May 2026)
Changed paths:
M amd/comgr/src/hotswap/canonical-op.cpp
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/handle-valu-vop3p.cpp
M amd/comgr/src/hotswap/opcode-map.cpp
A amd/comgr/test-lit/hotswap-raise/v_fma_mix_half_result_clamp.s
A amd/comgr/test-lit/hotswap-raise/v_fma_mix_half_result_dpp.s
A amd/comgr/test-lit/hotswap-raise/v_fma_mixhi_f16_bf16.s
A amd/comgr/test-lit/hotswap-raise/v_fma_mixlo_f16.s
Log Message:
-----------
[Comgr][hotswap] Support FMA MIX half-result forms
Commit: 1570306ec33c907aabb770901c81a341e83b9665
https://github.com/llvm/llvm-project/commit/1570306ec33c907aabb770901c81a341e83b9665
Author: Martin Lücke <mluecke at amd.com>
Date: 2026-05-19 (Tue, 19 May 2026)
Changed paths:
M amd/comgr/src/hotswap/CMakeLists.txt
M amd/comgr/src/hotswap/translation-cache.cpp
M amd/comgr/test-unit/CMakeLists.txt
Log Message:
-----------
[Comgr][hotswap] Fix hotswap build include and cache errors
Commit: 8aa6dfc5e308828bddebdaa585a33f8e0dbfbda0
https://github.com/llvm/llvm-project/commit/8aa6dfc5e308828bddebdaa585a33f8e0dbfbda0
Author: Martin Lücke <mluecke at amd.com>
Date: 2026-05-19 (Tue, 19 May 2026)
Changed paths:
M amd/comgr/src/hotswap/flat-addr.cpp
A amd/comgr/test-lit/hotswap-raise/global_flat_negative_offset.s
Log Message:
-----------
[Comgr][hotswap] Sign-extend GLOBAL/FLAT offsets
Commit: a3b128818a9a9d707bba771094988f1abbb9ca93
https://github.com/llvm/llvm-project/commit/a3b128818a9a9d707bba771094988f1abbb9ca93
Author: Tim Gymnich <tim at gymni.ch>
Date: 2026-05-20 (Wed, 20 May 2026)
Changed paths:
M amd/comgr/src/hotswap/canonical-op.cpp
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/handle-valu.cpp
M amd/comgr/src/hotswap/opcode-map.cpp
A amd/comgr/test-lit/hotswap-raise/v_fmamk_fmaak_f64.s
Log Message:
-----------
fixup! [Comgr][hotswap] Support scalar fmaak/fmamk f32 (#26)
- Support v_fmamk_f64
Commit: 72f34968711842f0c0601464a2e3c80cadec0c45
https://github.com/llvm/llvm-project/commit/72f34968711842f0c0601464a2e3c80cadec0c45
Author: Tim Gymnich <tim at gymni.ch>
Date: 2026-05-20 (Wed, 20 May 2026)
Changed paths:
M amd/comgr/src/hotswap/canonical-op.cpp
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/handle-valu.cpp
M amd/comgr/src/hotswap/opcode-map.cpp
A amd/comgr/test-lit/hotswap-raise/v_fma_f16.s
Log Message:
-----------
fixup! [Comgr][hotswap] Add VALU handlers (base + cross-lane + small-ops + VCMP) (#27)
- Add v_fma_f16 support
Commit: 0481648ec54fb84500cc55a769c2cf3296cf11dd
https://github.com/llvm/llvm-project/commit/0481648ec54fb84500cc55a769c2cf3296cf11dd
Author: Tim Gymnich <tim at gymni.ch>
Date: 2026-05-20 (Wed, 20 May 2026)
Changed paths:
M amd/comgr/src/hotswap/canonical-op.cpp
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/handle-valu-vop3p.cpp
M amd/comgr/src/hotswap/opcode-map.cpp
A amd/comgr/test-lit/hotswap-raise/v_pk_mul_f16.s
Log Message:
-----------
[Comgr][hotswap] Support v_pk_mul_f16 (#33)
Raises VOP3P v_pk_mul_f16 to a lane-wise `fmul <2 x half>` with the same
packed srcN_modifiers contract (OP_SEL_0 / OP_SEL_1 lane selection,
NEG / NEG_HI per-lane fneg) used by v_pk_fma_f16. Clamp is lowered to
maxnum/minnum saturation matching the existing packed-f16 path.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Commit: 4c61955b0a1367ceff5fdde697bb13c8a5edc377
https://github.com/llvm/llvm-project/commit/4c61955b0a1367ceff5fdde697bb13c8a5edc377
Author: Tim Gymnich <tim at gymni.ch>
Date: 2026-05-20 (Wed, 20 May 2026)
Changed paths:
M amd/comgr/src/hotswap/canonical-op.cpp
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/handle-sop1.cpp
M amd/comgr/src/hotswap/opcode-map.cpp
A amd/comgr/test-lit/hotswap-raise/s_bcnt1_i32_b32.s
A amd/comgr/test-lit/hotswap-raise/s_bcnt1_i32_b64.s
Log Message:
-----------
[Comgr][hotswap] Support s_bcnt1_i32_b32 and s_bcnt1_i32_b64 (#34)
Lower the scalar population-count SOP1 ops to llvm.ctpop on the
appropriately sized source. The B64 form truncates the i64 ctpop result
to i32 for the destination SGPR. Both write SCC = (D.u != 0), derived
automatically by the raiser from Hr.SccResult.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Commit: 4aa04e8000ca2d8b629cea0d383e1c2c2bf7c19a
https://github.com/llvm/llvm-project/commit/4aa04e8000ca2d8b629cea0d383e1c2c2bf7c19a
Author: Tim Gymnich <tim at gymni.ch>
Date: 2026-05-20 (Wed, 20 May 2026)
Changed paths:
M amd/comgr/src/hotswap/canonical-op.cpp
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/handle-valu.cpp
M amd/comgr/src/hotswap/opcode-map.cpp
A amd/comgr/test-lit/hotswap-raise/v_ldexp_f64.s
Log Message:
-----------
[Comgr][hotswap] Add v_ldexp_64 support (#25)
* fixup! [Comgr][hotswap] Add VALU handlers (base + cross-lane + small-ops + VCMP)
Commit: 7a497957092ab92214a449f28a3b3315dfb3b6a5
https://github.com/llvm/llvm-project/commit/7a497957092ab92214a449f28a3b3315dfb3b6a5
Author: Tim Gymnich <tim at gymni.ch>
Date: 2026-05-20 (Wed, 20 May 2026)
Changed paths:
M amd/comgr/src/hotswap/canonical-op.cpp
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/handle-valu-small-ops.cpp
M amd/comgr/src/hotswap/opcode-map.cpp
A amd/comgr/test-lit/hotswap-raise/v_cvt_u32_u16.s
Log Message:
-----------
fixup! [Comgr][hotswap] Add VALU handlers (base + cross-lane + small-ops + VCMP) (#28)
- Add v_cvt_u32_u16 support
Commit: 0274cc5f30a492c6a28eb6f03bfb577aeeee250e
https://github.com/llvm/llvm-project/commit/0274cc5f30a492c6a28eb6f03bfb577aeeee250e
Author: Tim Gymnich <tim at gymni.ch>
Date: 2026-05-20 (Wed, 20 May 2026)
Changed paths:
M amd/comgr/src/hotswap/canonical-op.cpp
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/handle-valu.cpp
M amd/comgr/src/hotswap/opcode-map.cpp
A amd/comgr/test-lit/hotswap-raise/v_mov_b16.s
Log Message:
-----------
fixup! [Comgr][hotswap] Add VALU handlers (base + cross-lane + small-ops + VCMP) (#29)
- Add v_mov_b16 support
Commit: 6701d2b608439cb99b9b5d56d64ba6aa74e6c155
https://github.com/llvm/llvm-project/commit/6701d2b608439cb99b9b5d56d64ba6aa74e6c155
Author: Aurore De Spirlet <61005249+adedespirlet at users.noreply.github.com>
Date: 2026-05-20 (Wed, 20 May 2026)
Changed paths:
M amd/comgr/src/hotswap/raiser.cpp
M amd/comgr/src/hotswap/raiser.h
M amd/comgr/test-lit/comgr-sources/raise_cli.cpp
Log Message:
-----------
extend raise_cli tool to report all unsupported instr (#30)
Previously raise_cli stopped at the first unsupported instruction and reported only that one. Now it traverses the entire kernel and reports all unique unsupported instruction types.
Landing intent: new incremental commit on top of 93916ffe [Comgr][hotswap] Wire raiser dispatcher + post-raise analyses + raise_cli
Commit: 9ac445de9c97bf0548c4769f75035301996ba533
https://github.com/llvm/llvm-project/commit/9ac445de9c97bf0548c4769f75035301996ba533
Author: Martin Paul Lücke <martin.luecke at amd.com>
Date: 2026-05-21 (Thu, 21 May 2026)
Changed paths:
M amd/comgr/src/hotswap/canonical-op.cpp
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/handle-valu-vop3p.cpp
M amd/comgr/src/hotswap/opcode-map.cpp
A amd/comgr/test-lit/hotswap-raise/v_pk_add_f16.s
M amd/comgr/test-unit/OpcodeMapTest.cpp
Log Message:
-----------
[Comgr][hotswap] Support v_pk_add_f16 (#35)
Model v_pk_add_f16 as lane-wise packed half addition on top of the v_pk_mul_f16 packed-F16 path, sharing modifier, clamp, and packing handling while adding focused opcode-map and lit coverage.
Commit: 18c8a556edd4434ae59776ed76eccba2f0be7269
https://github.com/llvm/llvm-project/commit/18c8a556edd4434ae59776ed76eccba2f0be7269
Author: Tim Gymnich <tim at gymni.ch>
Date: 2026-05-21 (Thu, 21 May 2026)
Changed paths:
M amd/comgr/src/hotswap/canonical-op.cpp
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/handle-sop2.cpp
M amd/comgr/src/hotswap/opcode-map.cpp
A amd/comgr/test-lit/hotswap-raise/s_maximum_f16.s
Log Message:
-----------
[Comgr][hotswap] Support s_maximum_f16 (#37)
Commit: e2e535cc7f1a6038dfef3265ad54b288c09024a7
https://github.com/llvm/llvm-project/commit/e2e535cc7f1a6038dfef3265ad54b288c09024a7
Author: Tim Gymnich <tim at gymni.ch>
Date: 2026-05-21 (Thu, 21 May 2026)
Changed paths:
M amd/comgr/src/hotswap/canonical-op.cpp
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/handle-flat.cpp
M amd/comgr/src/hotswap/opcode-map.cpp
A amd/comgr/test-lit/hotswap-raise/global_store_byte_d16_hi.s
Log Message:
-----------
[Comgr][hotswap] Support global_store_d16_hi_b8 (#41)
Commit: c1745634d1bbb0e312db1a49345eb0e2aec223ae
https://github.com/llvm/llvm-project/commit/c1745634d1bbb0e312db1a49345eb0e2aec223ae
Author: Tim Gymnich <tim at gymni.ch>
Date: 2026-05-21 (Thu, 21 May 2026)
Changed paths:
M amd/comgr/src/hotswap/canonical-op.cpp
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/handle-sop2.cpp
M amd/comgr/src/hotswap/opcode-map.cpp
A amd/comgr/test-lit/hotswap-raise/s_maximum_f32.s
Log Message:
-----------
[Comgr][hotswap] Support s_maximum_f32 (#43)
Commit: 198562630e50f71fd48896ce6eab1eacd04d5e50
https://github.com/llvm/llvm-project/commit/198562630e50f71fd48896ce6eab1eacd04d5e50
Author: Tim Gymnich <tim at gymni.ch>
Date: 2026-05-21 (Thu, 21 May 2026)
Changed paths:
M amd/comgr/src/hotswap/canonical-op.cpp
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/handle-valu-vop3p.cpp
M amd/comgr/src/hotswap/opcode-map.cpp
A amd/comgr/test-lit/hotswap-raise/v_pk_mul_lo_u16.s
Log Message:
-----------
[Comgr][hotswap] Support v_pk_mul_lo_u16 (#40)
Commit: 202ff48025fab36904fec9f0371897b73daf734f
https://github.com/llvm/llvm-project/commit/202ff48025fab36904fec9f0371897b73daf734f
Author: Tim Gymnich <tim at gymni.ch>
Date: 2026-05-21 (Thu, 21 May 2026)
Changed paths:
M amd/comgr/src/hotswap/canonical-op.cpp
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/handle-valu-small-ops.cpp
M amd/comgr/src/hotswap/opcode-map.cpp
A amd/comgr/test-lit/hotswap-raise/v_ceil_f64.s
Log Message:
-----------
[Comgr][hotswap] Support v_ceil_f64 (#42)
Commit: f5829b21093ae64c10a3e80422a804782bf81015
https://github.com/llvm/llvm-project/commit/f5829b21093ae64c10a3e80422a804782bf81015
Author: Tim Gymnich <tim at gymni.ch>
Date: 2026-05-26 (Tue, 26 May 2026)
Changed paths:
M amd/comgr/src/hotswap/canonical-op.cpp
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/handle-sop2.cpp
M amd/comgr/src/hotswap/opcode-map.cpp
A amd/comgr/test-lit/hotswap-raise/s_bfe_i64.s
A amd/comgr/test-lit/hotswap-raise/s_bfe_i64_exec.s
Log Message:
-----------
[Comgr][hotswap] Support s_bfe_i64 (#36)
Commit: b11891d15ae6a2df1ee2917b86e9c59096cbbcaf
https://github.com/llvm/llvm-project/commit/b11891d15ae6a2df1ee2917b86e9c59096cbbcaf
Author: Juan Manuel Martinez Caamaño <jmartinezcaamao at gmail.com>
Date: 2026-05-27 (Wed, 27 May 2026)
Changed paths:
M amd/comgr/src/hotswap/CMakeLists.txt
Log Message:
-----------
[Comgr][hotswap] Fix undefined reference to llvm::Triple(std::string...) (#77)
Add missing library: TargetParser
```
/usr/bin/ld:
tools/comgr/src/hotswap/CMakeFiles/hotswap-transpiler.dir/raiser.cpp.o:
undefined reference to symbol
'_ZN4llvm6TripleC1EONSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE'
```
Commit: 5e42ccd1e6311a937bae32ee4c95fb76d7b23006
https://github.com/llvm/llvm-project/commit/5e42ccd1e6311a937bae32ee4c95fb76d7b23006
Author: Juan Manuel Martinez Caamaño <jmartinezcaamao at gmail.com>
Date: 2026-06-01 (Mon, 01 Jun 2026)
Changed paths:
M amd/comgr/src/hotswap/canonical-op.cpp
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/handle-flat.cpp
M amd/comgr/src/hotswap/opcode-map.cpp
M amd/comgr/src/hotswap/rewrite-cross-lane-divergent.cpp
A amd/comgr/test-lit/hotswap-raise/flat_prefetch_b8.s
Log Message:
-----------
[Comgr][hotswap] Support for flat_prefetch_b8 (#78)
This mimics pretty much the support for `global_prefetch_b8`, only
changing the pointer's address space.
Commit: d6b3d3f0b15dc6c5a54f48e953c48bcecfae4dcf
https://github.com/llvm/llvm-project/commit/d6b3d3f0b15dc6c5a54f48e953c48bcecfae4dcf
Author: Juan Manuel Martinez Caamaño <jmartinezcaamao at gmail.com>
Date: 2026-06-01 (Mon, 01 Jun 2026)
Changed paths:
M amd/comgr/src/hotswap/canonical-op.cpp
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/handle-valu.cpp
M amd/comgr/src/hotswap/opcode-map.cpp
A amd/comgr/test-lit/hotswap-raise/v_add_max_i32.s
A amd/comgr/test-lit/hotswap-raise/v_add_max_u32.s
A amd/comgr/test-lit/hotswap-raise/v_add_min_i32.s
M amd/comgr/test-lit/hotswap-raise/v_add_min_u32.s
Log Message:
-----------
[Comgr][hotswap] Support for v_add_max_i32/v_add_max_u32/v_add_min_i32 (#85)
* [Comgr][hotswap] Support for v_add_max_i32/v_add_max_u32/v_add_min_i32
Implemented by generalizing the support for v_add_min_u32.
Commit: 9ccfb74821d6fbd63937e954da431b2ee9c2468d
https://github.com/llvm/llvm-project/commit/9ccfb74821d6fbd63937e954da431b2ee9c2468d
Author: Martin Paul Lücke <martin.luecke at amd.com>
Date: 2026-06-01 (Mon, 01 Jun 2026)
Changed paths:
M amd/comgr/src/hotswap/handle-flat.cpp
M amd/comgr/src/hotswap/isa-profile.h
M amd/comgr/test-lit/hotswap-raise/global_load_async_to_lds.s
Log Message:
-----------
[Comgr][hotswap] Gate async-to-LDS emulation on LDS-dest bounds (#88)
The gfx12 `global_load_async_to_lds` instruction drops a lane whose
LDS-destination offset is out of range: its write to LDS does not happen
(gfx12 programming manual, "Async LDS Load/Store" -- out of range means
past the LDS allocated to the workgroup, and unconditionally past the
physical LDS size). Triton predicates the masked (padding / K-edge) rows
of a GEMM tile load this way, parking their LDS destination at the
0x7fffffff (INT_MAX) sentinel; those rows' global tile addresses are
intentionally out of bounds and their LDS slots already hold the `other`
value, so the async copy is a no-op for them.
The cross-target (gfx1250 -> gfx942) emulation in handle-flat.cpp lowered
the async copy to a synchronous global load issued unconditionally for
every active lane. For a masked lane that load is doubly wrong: its result
is discarded (the LDS write is dropped anyway), and dereferencing the
lane's intentionally-OOB global address faults (HIP error 700) whenever
that address is unmapped. The fault is allocation-dependent -- it hides
under a loose allocator and surfaces under a tight one -- so it reproduces
in a full SGLang runtime (Qwen1.5-MoE expert GEMMs on the eager /
HIP-graphs-disabled decode path) but not in isolated single-kernel replay.
Skip the emulated load+store for lanes whose LDS destination is outside
the target's physical LDS capacity, replicating the hardware drop. The
capacity is a conservative, allocation-independent out-of-range bound,
queried per-target from IsaInfo (getAddressableLocalMemorySize) and held
in a new ISAProfile field so it tracks the target architecture.
Verified end-to-end: SGLang Qwen1.5-MoE-A2.7B (gfx1250 -> gfx942, HIP
graphs disabled) no longer faults and is numerically equivalent to the
local run (0 token divergence). global_load_async_to_lds.s is extended to
pin the LDS-dest gate.
Commit: a48e8a9cc3c2a7131ffdd7d9d3a8371890d3a68b
https://github.com/llvm/llvm-project/commit/a48e8a9cc3c2a7131ffdd7d9d3a8371890d3a68b
Author: Aurore De Spirlet <61005249+adedespirlet at users.noreply.github.com>
Date: 2026-06-02 (Tue, 02 Jun 2026)
Changed paths:
M amd/comgr/src/hotswap/canonical-op.cpp
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/handle-valu-small-ops.cpp
M amd/comgr/src/hotswap/handle-valu-vop3p.cpp
M amd/comgr/src/hotswap/handle-valu.cpp
M amd/comgr/src/hotswap/handle-vopd.cpp
M amd/comgr/src/hotswap/opcode-map.cpp
M amd/comgr/src/hotswap/rewrite-cross-lane-divergent.cpp
M amd/comgr/test-lit/hotswap-raise/v_minmax_num_f32.s
Log Message:
-----------
[Comgr][hotswap] Raise max/min (_num variant) ops as maximumnum/minimumnum (#86)
this commit lifts max/min ops (with “num” variant) to llvm.maximumnum / llvm.minimumnum, which matches IEEE 754-2019 maximumNumber / minimumNumber (numeric operand preferred over NaN) as required in the gfx1250 ISA docs. Operations were previously mapped to llvm.maxnum / llvm.minnum (which maps to an older 2008 ieee contract)
Changes:
Raise following to llvm maximumnum/minimumnum:
V_MAX/MIN_NUM_F64
V_MAX/MIN_NUM_F32
V_MAX/MIN_F16
V_MAX/MIN3_F32
V_MAX/MIN3_NUM_F32
V_MINMAX_NUM_F32/F16
V_MAXMIN_NUM_F32/F16
V_PK_MAX/MIN_NUM_BF16
V_MIN/MAX3_NUM_F32
V_MED3_NUM_F32
V_MAX/MIN_NUM_F64 (VOPD)
V_MAX/MIN_NUM_F32(VOPD)
whitelist intrinsic maximumnum and minimunum in the cross lane rewrite safety-net so readlane use chaine are not refused as unaudited.
removed redundant handler for v_max_f32 , created only one handler v_max_num_f32 (matches how f64 variant is handled) so now during opcode mapping instructions map to v_max_num_f32. Also created one if statement containing both handlers for v_max/min_num_32
made sure canonical op are consistent across ops and added "num" where needed
Commit: 26ed0af44565623db4bee36a4ad6b95542464fab
https://github.com/llvm/llvm-project/commit/26ed0af44565623db4bee36a4ad6b95542464fab
Author: Tim Gymnich <tim at gymni.ch>
Date: 2026-06-03 (Wed, 03 Jun 2026)
Changed paths:
M amd/comgr/src/hotswap/canonical-op.cpp
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/handle-sopc.cpp
M amd/comgr/src/hotswap/opcode-map.cpp
A amd/comgr/test-lit/hotswap-raise/s_cmp_f.s
Log Message:
-----------
[Comgr][hotswap] Support s_cmp_o/u_f32 and s_cmp_o/u_f16 (#81)
Add the ordered/unordered scalar FP compares to the hotswap raiser for
both F32 and F16. s_cmp_o_* lifts to FCmpORD (SCC set when neither
operand is NaN) and s_cmp_u_* to FCmpUNO (SCC set when either is NaN),
following the existing GFX12 scalar FP compare handling in handle-sopc.
A single lit fixture exercises all four compares end-to-end through the
SCC writeback into the following s_cselect_b32.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply at anthropic.com>
Commit: 26a6f105673e08fd8d2b962a1b2ccabc67662a51
https://github.com/llvm/llvm-project/commit/26a6f105673e08fd8d2b962a1b2ccabc67662a51
Author: Tim Gymnich <tim at gymni.ch>
Date: 2026-06-03 (Wed, 03 Jun 2026)
Changed paths:
M amd/comgr/src/hotswap/canonical-op.cpp
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/handle-sop2.cpp
M amd/comgr/src/hotswap/opcode-map.cpp
R amd/comgr/test-lit/hotswap-raise/s_maximum_f16.s
R amd/comgr/test-lit/hotswap-raise/s_maximum_f32.s
A amd/comgr/test-lit/hotswap-raise/s_minmax_f16.s
A amd/comgr/test-lit/hotswap-raise/s_minmax_f32.s
Log Message:
-----------
[Comgr][hotswap] Support s_minimum_f32 / s_minimum_f16 (#62)
Lift the gfx12 scalar IEEE-754-2019 NaN-propagating f32 and f16 minimum
to gfx942, alongside the existing s_maximum_f{16,32}. SOP2 with no
source/output modifiers, so reinterpret the SGPRs and route through
llvm.minimum.f{16,32} (NaN-propagating, sign-aware on zeros so -0 < +0).
The non-propagating NUM siblings stay under S_MIN_NUM_F32.
Consolidate the IEEE max/min lift tests into s_minmax_f32.s and
s_minmax_f16.s, matching the s_minmax_num_f32.s convention.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply at anthropic.com>
Commit: 6e2f2b15964fda79e7c7976ebbd0d02b8cdf6750
https://github.com/llvm/llvm-project/commit/6e2f2b15964fda79e7c7976ebbd0d02b8cdf6750
Author: Tim Gymnich <tim at gymni.ch>
Date: 2026-06-03 (Wed, 03 Jun 2026)
Changed paths:
M amd/comgr/src/hotswap/canonical-op.cpp
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/handle-valu-small-ops.cpp
M amd/comgr/src/hotswap/opcode-map.cpp
A amd/comgr/test-lit/hotswap-raise/v_frexp_exp_i32_f64.s
A amd/comgr/test-lit/hotswap-raise/v_frexp_exp_i32_f64_dpp.s
Log Message:
-----------
[Comgr][hotswap] Support v_frexp_exp_i32_f64 (#38)
Commit: 731e9463841d674a17bc73c9440ef761ca190cee
https://github.com/llvm/llvm-project/commit/731e9463841d674a17bc73c9440ef761ca190cee
Author: Tim Gymnich <tim at gymni.ch>
Date: 2026-06-03 (Wed, 03 Jun 2026)
Changed paths:
M amd/comgr/src/hotswap/handle-valu.cpp
M amd/comgr/src/hotswap/handle-vopd.cpp
A amd/comgr/test-lit/hotswap-raise/v_fmac_f32_fused.s
Log Message:
-----------
[Comgr][hotswap] Use llvm.fma for guaranteed-fused FMAC opcodes (#58)
The ISA manual defines v_fmac_f32 as a hardware-guaranteed fused
multiply-accumulate:
D.f32 = fma(S0.f32, S1.f32, D.f32)
The lift must therefore use llvm.fma, NOT llvm.fmuladd. llvm.fmuladd is
a relaxed-precision hint that the middle-end is permitted to split into
a separate fmul + fadd, which would change the rounding for inputs where
(a*b)+c rounds differently from fma(a, b, c).
Rounding-sensitive sample inputs: a = 1.0 + 2^-23, b = 1.0 + 2^-23,
c = -1.0. The exact product a*b is 1 + 2^-22 + 2^-46; rounded to f32 it
becomes 1 + 2^-22, so (a*b)+c rounds to 2^-22 while fma(a, b, c) returns
2^-22 + 2^-46 rounded to 2^-22 + 2^-45 (the low bit survives).
The e64 form in the test pins the output modifiers to their defaults so
the raiser's VOP3 mod guard is satisfied without literal modifier
tweaking.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply at anthropic.com>
Commit: 842befc58c2d3bc617456ba09865de005aa40ea0
https://github.com/llvm/llvm-project/commit/842befc58c2d3bc617456ba09865de005aa40ea0
Author: Tim Gymnich <tim at gymni.ch>
Date: 2026-06-03 (Wed, 03 Jun 2026)
Changed paths:
M amd/comgr/test-lit/comgr-sources/raise_cli.cpp
Log Message:
-----------
[Comgr][hotswap] Rewrite raise_cli to use llvm::cl::opt (#93)
Replace the hand-rolled argv parsing loop and manual usage() text with
llvm::cl options.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply at anthropic.com>
Commit: 0936f833893d1c82dc5de554400c0da471099402
https://github.com/llvm/llvm-project/commit/0936f833893d1c82dc5de554400c0da471099402
Author: Tim Gymnich <tim at gymni.ch>
Date: 2026-06-03 (Wed, 03 Jun 2026)
Changed paths:
M amd/comgr/src/hotswap/canonical-op.cpp
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/handle-valu-small-ops.cpp
M amd/comgr/src/hotswap/opcode-map.cpp
A amd/comgr/test-lit/hotswap-raise/v_frexp_mant_f32.s
Log Message:
-----------
[Comgr][hotswap] Support v_frexp_mant_f32 (#39)
Commit: 167e0d713ce3f5404c3e0c91702c95fcd4506b6a
https://github.com/llvm/llvm-project/commit/167e0d713ce3f5404c3e0c91702c95fcd4506b6a
Author: Tim Gymnich <tim at gymni.ch>
Date: 2026-06-03 (Wed, 03 Jun 2026)
Changed paths:
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/handle-sopp.cpp
M amd/comgr/src/hotswap/isa-profile.h
M amd/comgr/src/hotswap/opcode-map.cpp
A amd/comgr/test-lit/hotswap-raise/s_sendmsg.s
Log Message:
-----------
[Comgr][hotswap] Support s_sendmsg / s_sendmsghalt (#60)
Adds a SOPP handler for s_sendmsg and s_sendmsghalt with a narrow
allowed-message-ID policy:
* INTERRUPT (SIMM16==1): portable across every AMDGPU generation;
lowered to llvm.amdgcn.s.sendmsg / s.sendmsghalt with the current
M0 value as the payload argument.
* DEALLOC_VGPRS (SIMM16==3): gfx11+ only. Passed through when the
target supports the encoding; dropped to a no-op on cross-target
lifts to gfx942 (where ID=3 is reserved and VGPRs are freed
implicitly at s_endpgm so the early hint has no observable effect).
* Any other SIMM16: refuses via RaiseFailure::unsupportedShape -- the
same numeric ID means different things across generations
(e.g. gfx12 reserves ID=4 but gfx942 uses it for SAVEWAVE), so a
blind pass-through would silently misencode on cross-target lifts.
Introduces ISAProfile::SupportsDeallocVgprs (derived from
llvm::AMDGPU::isGFX11Plus) so the DEALLOC_VGPRS drop-vs-passthrough
decision is target-policy rather than baked into the handler.
s_sendmsg_rtn_b{32,64} is intentionally deferred to a follow-up: it
has no gfx942 equivalent (cross-target must refuse entirely) and
needs SDST write-back modeling that the void s_sendmsg forms don't.
Tests cover the four arms: INTERRUPT same-target lift,
DEALLOC_VGPRS same-target pass-through, DEALLOC_VGPRS cross-target
drop, and an unsupported-ID refusal exercising raise_cli's stderr
diagnostic.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Commit: e9b3bb1bb1edc59a0f04973aa667cb033547d34e
https://github.com/llvm/llvm-project/commit/e9b3bb1bb1edc59a0f04973aa667cb033547d34e
Author: Tim Gymnich <tim at gymni.ch>
Date: 2026-06-03 (Wed, 03 Jun 2026)
Changed paths:
M amd/comgr/src/hotswap/canonical-op.cpp
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/handle-valu-small-ops.cpp
M amd/comgr/src/hotswap/isa-profile.h
M amd/comgr/src/hotswap/opcode-map.cpp
A amd/comgr/test-lit/hotswap-raise/v_prng_b32.s
Log Message:
-----------
[Comgr][hotswap] Support v_prng_b32 (#79)
Lift v_prng_b32 as the branch-free LFSR
expansion `(in << 1) ^ ((in ashr 31) & 197)`. The hardware op exists
on subtargets with FeaturePrngInst (gfx950, gfx1250, gfx13+)
emitting the expansion in IR keeps every target other arm working.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Commit: 40a94edecb83e26d7a7d7dd7c29996a742cb0a00
https://github.com/llvm/llvm-project/commit/40a94edecb83e26d7a7d7dd7c29996a742cb0a00
Author: Tim Gymnich <tim at gymni.ch>
Date: 2026-06-05 (Fri, 05 Jun 2026)
Changed paths:
M amd/comgr/src/hotswap/canonical-op.cpp
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/handle-valu-small-ops.cpp
M amd/comgr/src/hotswap/handle-valu.cpp
M amd/comgr/src/hotswap/opcode-map.cpp
A amd/comgr/test-lit/hotswap-raise/v_cvt_i32_f64.s
Log Message:
-----------
[Comgr][hotswap] Support v_cvt_i32_f64 (#63)
Add support for v_cvt_i32_f64, which converts an f64 source to a signed 32-bit integer.
The hardware does a saturating conversion: values outside the i32 range clamp to INT_MIN/INT_MAX, and NaN maps to 0. So this lowers to llvm.fptosi.sat rather than a plain fptosi, which is undefined behavior on overflow and would not match the hardware result.
While here, the existing v_cvt_u32_f64 handler had the same problem — it used a plain fptoui. This changes it to llvm.fptoui.sat for the same reason (out-of-range clamps to 0/UINT_MAX, NaN to 0).
The e64 source modifiers (neg/abs on src0) are supported; output modifiers (clamp/omod) are rejected.
Includes a lit test covering the plain form and both source modifiers.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply at anthropic.com>
Commit: c5190b3b91ad848740f60f3287c0c9b412e74f1b
https://github.com/llvm/llvm-project/commit/c5190b3b91ad848740f60f3287c0c9b412e74f1b
Author: Tim Gymnich <tim at gymni.ch>
Date: 2026-06-05 (Fri, 05 Jun 2026)
Changed paths:
M amd/comgr/src/hotswap/handle-sop2.cpp
M amd/comgr/src/hotswap/handle-valu.cpp
M amd/comgr/test-lit/hotswap-raise/s_lshr_b64_imm.s
Log Message:
-----------
[Comgr][hotswap] Mask 64-bit shift counts to 6 bits (#59)
AMDGPU masks 64-bit shift counts to the low 6 bits, but LLVM treats
shifts where the count is >= bitwidth as poison. The scalar and
vector 64-bit shift handlers zext'd the count without masking, so an
asm-level shift count of 64+ lifted to poison-producing IR instead of
the hardware-equivalent in-range shift.
Mask the zext'd count with a named `ShiftCountMask = (1 << 6) - 1`
before the shl/lshr/ashr in:
* S_LSHL_B64, S_LSHR_B64, S_ASHR_I64 (handle-sop2.cpp)
* V_LSHLREV_B64, V_LSHRREV_B64, V_ASHRREV_I64,
V_LSHL_ADD_U64 (handle-valu.cpp)
The mask constant-folds when src1 is an immediate, so in-range
literals (e.g. the existing s_lshr_b64_imm.s `16` corpus shape)
still lift to `lshr i64 .., 16` with no extra IR.
Tests: s_lshr_b64_imm.s shifts the same i64 by 16/64/65/127 in one
kernel so a single raise pass covers all endpoints; the CHECK lines
sit next to the asm they pin (64 -> 0, 65 -> 1, 127 -> 63).
Fixes martin-luecke/llvm-project#51
Co-authored-by: Claude Opus 4.8 (1M context) <noreply at anthropic.com>
Commit: 56beab43b6dabc9d21370a92a2fb799f262e5c80
https://github.com/llvm/llvm-project/commit/56beab43b6dabc9d21370a92a2fb799f262e5c80
Author: Tim Gymnich <tim at gymni.ch>
Date: 2026-06-05 (Fri, 05 Jun 2026)
Changed paths:
M amd/comgr/src/hotswap/canonical-op.cpp
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/handle-sopp.cpp
M amd/comgr/src/hotswap/opcode-map.cpp
A amd/comgr/test-lit/hotswap-raise/s_wait_storecnt.s
Log Message:
-----------
[Comgr][hotswap] Support s_wait_storecnt (#61)
gfx1250 splits the legacy `s_waitcnt` into per-resource counters; on
gfx942 only vmcnt/lgkmcnt/expcnt exist, with store ordering folded
into vmcnt. Canonicalise gfx1250's `s_wait_storecnt` and lower it to
the conservative wait-all form (`s_waitcnt 0`) on the cross-target
arm, matching the existing treatment of `s_wait_loadcnt` /
`s_wait_dscnt` / `s_wait_kmcnt` in `handle-sopp.cpp`. Without this,
the raiser hits the SOPP no-op catch-all and drops the source wait,
letting gfx942 schedule subsequent VMEM ops ahead of the still-in-
flight store.
Add `S_WAIT_STORECNT` to the `CanonicalOp` enum, its name-switch in
`canonical-op.cpp`, its `S_WAIT_STORECNT -> S_WAIT_STORECNT` entry in
`opcode-map.cpp`, and extend the wait-counter arm in `handle-sopp.cpp`
to recognise it. New lit fixture `s_wait_storecnt.s` covers N=0, N=1,
and N=3 between back-to-back `global_store_b32` sites.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Commit: e0a1e82f106d2bec500474bf653871249f8c8a25
https://github.com/llvm/llvm-project/commit/e0a1e82f106d2bec500474bf653871249f8c8a25
Author: Tim Gymnich <tim at gymni.ch>
Date: 2026-06-08 (Mon, 08 Jun 2026)
Changed paths:
M amd/comgr/CMakeLists.txt
M amd/comgr/src/comgr-device-libs.cpp
M amd/comgr/src/comgr-device-libs.h
M amd/comgr/src/hotswap/CMakeLists.txt
M amd/comgr/src/hotswap/canonical-op.cpp
M amd/comgr/src/hotswap/canonical-op.h
A amd/comgr/src/hotswap/handle-valu-f16-utils.cpp
A amd/comgr/src/hotswap/handle-valu-f16-utils.h
M amd/comgr/src/hotswap/handle-valu-small-ops.cpp
M amd/comgr/src/hotswap/handle-valu.cpp
M amd/comgr/src/hotswap/isa-profile.h
A amd/comgr/src/hotswap/ocml-runtime.cpp
A amd/comgr/src/hotswap/ocml-runtime.h
M amd/comgr/src/hotswap/opcode-map.cpp
M amd/comgr/src/hotswap/raise-failure.cpp
M amd/comgr/src/hotswap/raise-failure.h
M amd/comgr/src/hotswap/raiser.cpp
M amd/comgr/src/hotswap/translation-cache.cpp
M amd/comgr/test-lit/CMakeLists.txt
A amd/comgr/test-lit/hotswap-raise/v_tanh_f16.s
A amd/comgr/test-lit/hotswap-raise/v_tanh_f32.s
M amd/comgr/test-unit/CMakeLists.txt
M amd/comgr/test-unit/OpcodeMapTest.cpp
Log Message:
-----------
[Comgr][hotswap] Support OCML-backed v_tanh_f32 transpilation (#96)
Add HotSwap support for v_tanh_f32 and v_tanh_f16 when translating gfx1250
code objects to targets without native tanh, lowering through __ocml_tanh_f32
/ __ocml_tanh_f16. Targets that can select llvm.amdgcn.tanh.* keep the native
path; other targets link COMGR's embedded OCML device libraries into the raised
module, inline the helper call chain, and reject unresolved device-library
references. OCML is linked before cross-lane rewrites so DPP use-chain proof
sees inlined arithmetic. Cache identity now includes the embedded
device-library identifier. Shared true16 F16 source-modifier and
destination-half merge helpers are factored into handle-valu-f16-utils.
Squashed from martin-luecke/llvm-project#24.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply at anthropic.com>
Commit: e3e309b6384eae595e9fc15bbc6159817cd4c553
https://github.com/llvm/llvm-project/commit/e3e309b6384eae595e9fc15bbc6159817cd4c553
Author: Tim Gymnich <tim at gymni.ch>
Date: 2026-06-08 (Mon, 08 Jun 2026)
Changed paths:
M amd/comgr/src/hotswap/canonical-op.cpp
M amd/comgr/src/hotswap/canonical-op.h
M amd/comgr/src/hotswap/decode.cpp
M amd/comgr/src/hotswap/handle-sop1.cpp
M amd/comgr/src/hotswap/opcode-map.cpp
A amd/comgr/test-lit/hotswap-raise/s_add_pc_i64.s
Log Message:
-----------
[Comgr][hotswap] Support s_add_pc_i64 (#76)
* [Comgr][hotswap] Support s_add_pc_i64
Add raiser support for the gfx1250 SOP1 PC-relative direct branch
s_add_pc_i64. The instruction takes a signed i64 byte offset relative
to PC_after_instruction and is emitted by the AMDGPU backend as a
far-branch lowering when s_branch's 16-bit offset cannot reach.
Lowers the immediate-literal form to `br label %bb_<target>` where
target = Di.Offset + Di.Size + imm. The SGPR-pair form is refused as
unsupportedShape (codegen never emits it).
Block-leader discovery in collectBranchTargets is special-cased away
from the SOPP 16-bit *4 short-branch decode that would otherwise
mis-truncate the i64 operand.
* [Comgr][hotswap] Add s_add_pc_i64 long-encoding test
Cover both s_add_pc_i64 forms in one lit fixture: short (4-byte inline
constant) and long (8-byte 32-bit literal). The long form pins that the
handler reads Di.Size from the actual instruction encoding rather than
assuming 4 -- a regression that hard-coded the size would miscompute the
branch target and miss the leader.
Commit: e32baf3b0cbdbdd9a47e22982a3aecb1e672fbb6
https://github.com/llvm/llvm-project/commit/e32baf3b0cbdbdd9a47e22982a3aecb1e672fbb6
Author: Tim Gymnich <tim at gymni.ch>
Date: 2026-06-08 (Mon, 08 Jun 2026)
Changed paths:
A amd/comgr/test-lit/hotswap-raise/vopd_same_opcode_dual_issue.s
Log Message:
-----------
[Comgr][hotswap] Add VOPD same-opcode dual-issue lift test (#74)
Adds one lift-pin fixture covering VOPD packets that pair a component
with itself: v_dual_{add,mul,sub,fma,fmaak}_f32, v_dual_sub_nc_u32, and
v_dual_lshlrev_b32. Each opcode gets its own packet in a single kernel,
with both VOPD-half destinations stored through separate b32 stores so
mem2reg keeps both writes live and a regression in either half's
CanonicalOp branch surfaces as a missing CHECK.
Consolidated into a single file (following vopd_extra_subops.s) instead
of one file per opcode to cut test boilerplate.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply at anthropic.com>
Commit: 4103a24c5e6f12802aac01f081179940ba829de8
https://github.com/llvm/llvm-project/commit/4103a24c5e6f12802aac01f081179940ba829de8
Author: Alex Zinenko <git at ozinenko.com>
Date: 2026-06-10 (Wed, 10 Jun 2026)
Changed paths:
M amd/comgr/src/hotswap/amdgpu-mode-hwreg.h
M amd/comgr/src/hotswap/handle-sopk.cpp
A amd/comgr/test-lit/hotswap-raise/setreg_mode_vgpr_msb.s
Log Message:
-----------
[Comgr][hotswap] Mirror VGPR_MSB capture on MODE-targeted s_setreg_imm32_b32
On gfx1250, each wave's MODE register packs the four 2-bit high-bit
(most significant bit) selectors for VGPR operands -- destination,
source 0, source 1, and source 2 -- into bits 12 through 19. Any
s_setreg_imm32_b32 that targets the MODE register captures those bits
from the immediate's bits 12 through 19 and stores them unconditionally.
The instruction's field selector (offset and size) only controls the
named-field write; it does not gate this high-bit capture.
LLVM's AMDGPULowerVGPREncoding pass (the updateSetregModeImm function in
llvm/lib/Target/AMDGPU/AMDGPULowerVGPREncoding.cpp) takes advantage of
this. It folds the current high-bit state into the MODE setreg that the
compiler already emits in the kernel prologue to program REPLAY_MODE,
producing immediate values such as 0x1001 (bit 0 is the REPLAY_MODE
payload; bits 12 through 19 are the high-bit selectors). The pass's own
comment, "Note that Offset is ignored for mode bits here," documents
that this capture happens regardless of the offset field.
The hotswap lifter previously updated its tracked high-bit state only
on a dedicated s_set_vgpr_msb instruction (handle-sopp.cpp), so it
missed this fold path. For any source kernel that reaches high VGPRs
through the prologue MODE setreg, the lifter resolved later VGPR
encodings against a high-bit value of zero and silently miscompiled them.
A concrete failure was observed on the gemma3-4b flash-attention
prefill kernel (_fwd_kernel, with .amdhsa_next_free_vgpr=513). The
prologue setreg
s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE, 25, 1), 0x1001
sets the destination high-bit selector to 0b01, and the next instruction
v_writelane_b32 v0, s0, 0
therefore encodes physical register v256 (a VGPR used for SGPR spills).
Without this fix the lifter rewrote the writelane against physical v0,
which on this kernel holds the workitem id (via
.amdhsa_system_vgpr_workitem_id=0). The corrupted lane id then
propagated into the LDS Q-tile base address and the K-by-Q matrix
multiply produced NaN.
The fix mirrors the hardware behavior in the s_setreg_imm32_b32 preserve
path in handle-sopk.cpp: when the target is the MODE register, it
extracts bits 12 through 19 as a MODE-format high-bit byte and rotates
it right by two to convert it into the s_set_vgpr_msb layout the lifter
already tracks. This is the inverse of LLVM's convertModeToSetregFormat,
which rotates left by two.
Verified with a new lit test
(test-lit/hotswap-raise/setreg_mode_vgpr_msb.s) that walks the canonical
prologue: a MODE setreg with bits 12 through 19 set to 01 makes
v_writelane v0 lift as register v256; a later read of v0 with the
source-0 high-bit set to 1 then sources v256; and a store propagates
that value out.
Co-Authored-By: Claude Opus 4.7 <noreply at anthropic.com>
Commit: 4859f164c3d1953744fd90d0c23f6cabeb5d7976
https://github.com/llvm/llvm-project/commit/4859f164c3d1953744fd90d0c23f6cabeb5d7976
Author: Alex Zinenko <git at ozinenko.com>
Date: 2026-06-11 (Thu, 11 Jun 2026)
Changed paths:
M amd/comgr/src/hotswap/amdgpu-mode-hwreg.h
M amd/comgr/src/hotswap/handle-sopk.cpp
M amd/comgr/src/hotswap/isa-profile.h
A amd/comgr/test-lit/hotswap-raise/setreg_b32_mode_vgpr_msb_refuse.s
M amd/comgr/test-lit/hotswap-raise/setreg_mode_vgpr_msb.s
Log Message:
-----------
[Comgr][hotswap] Address review
- Gate MODE VGPR_MSB capture on Has1024AddressableVGPRs source ISA flag
(bits [12:19] are ordinary FP-mode fields on older targets).
- Add Has1024AddressableVGPRs to ISAProfile (Feature1024AddressableVGPRs).
- Use llvm::rotr<uint8_t> instead of manual byte rotation.
- Refuse S_SETREG_B32 targeting MODE whose field overlaps bits [12:19]
on gfx1250: the SGPR value is dynamic, so VgprMsBs cannot be updated.
- Scope VGPR_MSB doc comment to gfx1250+ targets.
- Strengthen setreg_mode_vgpr_msb lit test: capture Vgpr256 phi name
and assert it flows through Vgpr1 into the final store.
- Add setreg_b32_mode_vgpr_msb_refuse lit test: verify the raiser
refuses S_SETREG_B32 MODE writes overlapping VGPR_MSB bits [12:19].
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply at anthropic.com>
Compare: https://github.com/llvm/llvm-project/compare/e20dd2510875%5E...4859f164c3d1
To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications
More information about the All-commits
mailing list