[PATCH] D76861: [AMDGPU] Fix getEUsPerCU for gfx10 in CU mode

Matt Arsenault via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Sep 15 13:11:18 PDT 2022


arsenm added inline comments.


================
Comment at: llvm/test/CodeGen/AMDGPU/attr-amdgpu-flat-work-group-size-vgpr-limit.ll:3
+; -regalloc=fast just makes the test run faster
+; RUN: llc -march=amdgcn -mcpu=gfx900 -amdgpu-function-calls=false -enable-misched=false -regalloc=fast < %s | FileCheck %s --check-prefixes=GCN,GFX9
+; RUN: llc -march=amdgcn -mcpu=gfx1010 -amdgpu-function-calls=false -enable-misched=false -regalloc=fast < %s | FileCheck %s --check-prefixes=GCN,GFX10WGP-WAVE32
----------------
arsenm wrote:
> arsenm wrote:
> > foad wrote:
> > > LuoYuanke wrote:
> > > > Specifying `-regalloc=fast` is not reliable. With fast register allocation, `LIS = getAnalysisIfAvailable<LiveIntervals>();` get nullptr in "si-lower-sgpr-spills" pass, so the slot index is not created in the pass for new inserted instructions. When verifying the machine intruction, it fails on checking slot index. It can be reproduced with below test case. Is it possible to use greedy-ra and reduce the compiling time for this test case?
> > > > 
> > > > 
> > > > ```
> > > > define internal void @use256vgprs() {
> > > >   %v0 = call i32 asm sideeffect "; def $0", "=v"()
> > > >   %v1 = call i32 asm sideeffect "; def $0", "=v"()
> > > >   call void asm sideeffect "; use $0", "v"(i32 %v0)
> > > >   call void asm sideeffect "; use $0", "v"(i32 %v1)
> > > >   ret void
> > > > }
> > > > 
> > > > define amdgpu_kernel void @f256() #256 {
> > > >   call void @use256vgprs()
> > > >   ret void
> > > > }
> > > > attributes #256 = { nounwind "amdgpu-flat-work-group-size"="256,256" }
> > > > 
> > > > define amdgpu_kernel void @f512() #512 {
> > > >   call void @foo()
> > > >   call void @use256vgprs()
> > > >   ret void
> > > > }
> > > > attributes #512 = { nounwind "amdgpu-flat-work-group-size"="512,512" }
> > > > 
> > > > define amdgpu_kernel void @f1024() #1024 {
> > > >   call void @foo()
> > > >   call void @use256vgprs()
> > > >   ret void
> > > > }
> > > > 
> > > > attributes #1024 = { nounwind "amdgpu-flat-work-group-size"="1024,1024" }
> > > > 
> > > > declare void @foo()
> > > > 
> > > > ```
> > > That sounds like a bug in SILowerSGPRSpills. It should not claim to preserve SlotIndexes if it does not preserve them.
> > The pass is only updating SlotIndexes through LiveIntervals, assuming they come as a pair. With -verify-machineinstrs, somehow we end up in a situation with SlotIndexes and without LiveIntervals
> Apparently StackColoring runs at -O0 and only uses SlotIndexes
D133970


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76861/new/

https://reviews.llvm.org/D76861



More information about the llvm-commits mailing list