[PATCH] D31161: [AMDGPU] New Waitcnt Insertion Pass

Thu Mar 23 17:28:06 PDT 2017

kanarayan updated this revision to Diff 92883.
kanarayan added a comment.

This addresses two issues:

1. SQ_MAX_PGM_VGPRS and other constants are used to map the llvm register map to a data structure internal to this algorithm. For the register ampping used by the algorithm see the comments before the enum. They are maximum values across all targets. It is ideal to use dynamically sized arrays that fit the particular architecture target. As an interim step, I have changed these constants to enum values, asserted that no target has a larger register file in the main entry to this pass. (In response to comments from Tony and Konstantin) I have also updated the getRegInterval call. Please notice that the previous version was identical to the pass used by the old pass except I do the necessary adjustments for the mapping used by this algorithm. In a next step, I will remove the assert and the associated code.
2. Barriers no longer force a zero waitcnt. For GFX9 and above, barrier needs no additional waitcnt. For lesser targets, waitcnts are added only if needed. The following tests already test barrier: LLVM :: CodeGen/AMDGPU/addrspacecast.ll LLVM :: CodeGen/AMDGPU/array-ptr-calc-i32.ll LLVM :: CodeGen/AMDGPU/ds-negative-offset-addressing-mode-loop.ll LLVM :: CodeGen/AMDGPU/ds_read2.ll LLVM :: CodeGen/AMDGPU/indirect-private-64.ll LLVM :: CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll LLVM :: CodeGen/AMDGPU/local-memory.amdgcn.ll LLVM :: CodeGen/AMDGPU/merge-stores.ll LLVM :: CodeGen/AMDGPU/schedule-vs-if-nested-loop-failure.ll LLVM :: CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll LLVM :: CodeGen/AMDGPU/store-barrier.ll LLVM :: CodeGen/AMDGPU/wait.ll

There is also a CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll

This patch does not yet address the XNACK changes in https://reviews.llvm.org/D30302

https://reviews.llvm.org/D31161

Files:
  lib/Target/AMDGPU/AMDGPU.h
  lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
  lib/Target/AMDGPU/CMakeLists.txt
  lib/Target/AMDGPU/SIInsertWaitcnts.cpp
  test/CodeGen/AMDGPU/basic-branch.ll
  test/CodeGen/AMDGPU/branch-relaxation.ll
  test/CodeGen/AMDGPU/control-flow-fastregalloc.ll
  test/CodeGen/AMDGPU/indirect-addressing-si.ll
  test/CodeGen/AMDGPU/infinite-loop.ll
  test/CodeGen/AMDGPU/llvm.amdgcn.buffer.store.format.ll
  test/CodeGen/AMDGPU/llvm.amdgcn.buffer.store.ll
  test/CodeGen/AMDGPU/llvm.amdgcn.image.ll
  test/CodeGen/AMDGPU/llvm.amdgcn.s.dcache.inv.ll
  test/CodeGen/AMDGPU/llvm.amdgcn.s.dcache.inv.vol.ll
  test/CodeGen/AMDGPU/llvm.amdgcn.s.dcache.wb.ll
  test/CodeGen/AMDGPU/llvm.amdgcn.s.dcache.wb.vol.ll
  test/CodeGen/AMDGPU/llvm.amdgcn.s.waitcnt.ll
  test/CodeGen/AMDGPU/si-lower-control-flow-unreachable-block.ll
  test/CodeGen/AMDGPU/smrd-vccz-bug.ll
  test/CodeGen/AMDGPU/spill-m0.ll
  test/CodeGen/AMDGPU/valu-i1.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D31161.92883.patch
Type: text/x-patch
Size: 87149 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170324/5424bc6f/attachment.bin>