[llvm] [AMDGPU] Prefer `s_memtime` for `readcyclecounter` on GFX10 (PR #80211)

Joseph Huber via llvm-commits llvm-commits at lists.llvm.org
Wed Jan 31 14:40:27 PST 2024


https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/80211

Summary:
The old `s_memtime` instruction was supported until the GFX10
architecture. Although this instruction has a higher latency than the
new shader counter, it's much more usable as a processor clock as it is
a full 64-bit counter. The new shader counter is only a 20-bit counter,
which makes it difficult to use as a standard cycle counter as it will
overflow in a few milliseconds. This patch suggests preferring
`s_memtime` for this instrinsic if it is still available.


>From a21409ccebada1d8ae770fdab7e699da060f09a5 Mon Sep 17 00:00:00 2001
From: Joseph Huber <huberjn at outlook.com>
Date: Wed, 31 Jan 2024 16:36:34 -0600
Subject: [PATCH] [AMDGPU] Prefer `s_memtime` for `readcyclecounter` on GFX10

Summary:
The old `s_memtime` instruction was supported until the GFX10
architecture. Although this instruction has a higher latency than the
new shader counter, it's much more usable as a processor clock as it is
a full 64-bit counter. The new shader counter is only a 20-bit counter,
which makes it difficult to use as a standard cycle counter as it will
overflow in a few milliseconds. This patch suggests preferring
`s_memtime` for this instrinsic if it is still available.
---
 llvm/lib/Target/AMDGPU/SMInstructions.td     | 2 --
 llvm/test/CodeGen/AMDGPU/readcyclecounter.ll | 4 ++--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SMInstructions.td b/llvm/lib/Target/AMDGPU/SMInstructions.td
index f082de35b6ae9..f3096962e2f3e 100644
--- a/llvm/lib/Target/AMDGPU/SMInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SMInstructions.td
@@ -1065,8 +1065,6 @@ def : GCNPat <
   (REG_SEQUENCE SReg_64,
     (S_GETREG_B32 getHwRegImm<HWREG.SHADER_CYCLES, 0, -12>.ret), sub0,
     (S_MOV_B32 (i32 0)), sub1)> {
-  // Prefer this to s_memtime because it has lower and more predictable latency.
-  let AddedComplexity = 1;
 }
 } // let OtherPredicates = [HasShaderCyclesRegister]
 
diff --git a/llvm/test/CodeGen/AMDGPU/readcyclecounter.ll b/llvm/test/CodeGen/AMDGPU/readcyclecounter.ll
index 046a6d0a8cb7e..fd422b344d834 100644
--- a/llvm/test/CodeGen/AMDGPU/readcyclecounter.ll
+++ b/llvm/test/CodeGen/AMDGPU/readcyclecounter.ll
@@ -4,8 +4,8 @@
 ; RUN: llc -global-isel=1 -mtriple=amdgcn -mcpu=tonga -verify-machineinstrs < %s | FileCheck -check-prefix=MEMTIME -check-prefix=SIVI -check-prefix=GCN %s
 ; RUN: llc -global-isel=0 -mtriple=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s | FileCheck -check-prefix=MEMTIME -check-prefix=GCN %s
 ; RUN: llc -global-isel=1 -mtriple=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s | FileCheck -check-prefix=MEMTIME -check-prefix=GCN %s
-; RUN: llc -global-isel=0 -mtriple=amdgcn -mcpu=gfx1030 -verify-machineinstrs < %s | FileCheck -check-prefixes=GETREG,GETREG-SDAG -check-prefix=GCN %s
-; RUN: llc -global-isel=1 -mtriple=amdgcn -mcpu=gfx1030 -verify-machineinstrs < %s | FileCheck -check-prefixes=GETREG,GETREG-GISEL -check-prefix=GCN %s
+; RUN: llc -global-isel=0 -mtriple=amdgcn -mcpu=gfx1030 -verify-machineinstrs < %s | FileCheck -check-prefixes=MEMTIME -check-prefix=GCN %s
+; RUN: llc -global-isel=1 -mtriple=amdgcn -mcpu=gfx1030 -verify-machineinstrs < %s | FileCheck -check-prefixes=MEMTIME -check-prefix=GCN %s
 ; RUN: llc -global-isel=0 -mtriple=amdgcn -mcpu=gfx1100 -verify-machineinstrs -amdgpu-enable-vopd=0 < %s | FileCheck -check-prefixes=GETREG,GETREG-SDAG -check-prefix=GCN %s
 ; RUN: llc -global-isel=1 -mtriple=amdgcn -mcpu=gfx1100 -verify-machineinstrs -amdgpu-enable-vopd=0 < %s | FileCheck -check-prefixes=GETREG,GETREG-GISEL -check-prefix=GCN %s
 ; RUN: llc -global-isel=0 -mtriple=amdgcn -mcpu=gfx1200 -verify-machineinstrs < %s | FileCheck -check-prefixes=GCN,GFX12 %s



More information about the llvm-commits mailing list