[llvm] [AMDGPU] Add backward compatibility layer for kernarg preloading (PR #119167)
Janek van Oirschot via llvm-commits
llvm-commits at lists.llvm.org
Wed Dec 11 12:32:56 PST 2024
================
@@ -0,0 +1,229 @@
+//===- AMDGPUPreloadKernargHeader.cpp - Preload Kernarg Header ------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+/// \file This pass handles the creation of the backwards compatability layer
+/// for kernarg prealoding. Code may be compiled with the feature enabled, while
+/// the kernel is executed on hardware without firmware support.
+///
+/// To avoid the need for recompilation, we insert a block at the beginning of
+/// the kernel that is responsible for loading the kernel arguments into SGPRs
+/// using s_load instructions which setup the registers exactly as they would be
+/// by firmware if the code were executed on a system that supported kernarg
+/// preladoing.
+///
+/// This essentially allows for two entry points for the kernel. Firmware that
+/// supports the feature will automatically jump past the first 256 bytes of the
+/// program, skipping the backwards compatibility layer and directly beginning
+/// execution on the fast code path.
+///
+/// This pass should be run as late as possible, to avoid any optimization that
+/// may assume that padding is dead code or that the prologue added here is a
+/// true predecessor of the kernel entry block.
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPUPreloadKernargHeader.h"
+#include "AMDGPU.h"
+#include "GCNSubtarget.h"
+#include "SIMachineFunctionInfo.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/TargetParser/TargetParser.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "amdgpu-preload-kernarg-header"
+
+namespace {
+
+struct LoadConfig {
+ unsigned Size;
+ const TargetRegisterClass *RegClass;
+ unsigned Opcode;
+ Register LoadReg;
+
+ // Constructor for the static config array
----------------
JanekvO wrote:
I may be missing an obvious reason but, can we use the default constructor behavior instead of these explicitly constructors?
(May need to change `Register LoadReg;` for that as well, though: `Register LoadReg = AMDGPU::NoRegister;`)
https://github.com/llvm/llvm-project/pull/119167
More information about the llvm-commits
mailing list