[PATCH] D54226: AMDGPU/InsertWaitcnts: Untangle some semi-global state

Wed Nov 7 14:15:32 PST 2018

nhaehnle created this revision.
nhaehnle added reviewers: msearles, rampitec, scott.linder, kanarayan.
Herald added subscribers: t-tye, tpr, dstuttard, yaxunl, wdng, jvesely, kzhuravl, arsenm.

Reduce the statefulness of the algorithm in two ways:

1. More clearly split generateWaitcntInstBefore into two phases: the first one which determines the required wait, if any, without changing the ScoreBrackets, and the second one which actually inserts the wait and updates the brackets.

2. Communicate pre-existing s_waitcnt instructions using an argument to generateWaitcntInstBefore instead of through the ScoreBrackets.

To simplify these changes, a Waitcnt structure is introduced which carries
the counts of an s_waitcnt instruction in decoded form.

There are some functional changes:

1. The FIXME for the VCCZ bug workaround was implemented: we only wait for SMEM instructions as required instead of waiting on all counters.

2. There are some cases where we previously merged some waitcnt instructions together non-locally due to the somewhat odd OldWaitcnt tracking, e.g. we would produce code like this:

  ds_read_b32 v0, ... ds_read_b32 v1, ... s_waitcnt lgkmcnt(0)    <-- this is a merged wait for both uses use(v0) more code use(v1)

  In these cases we will now always first emit a wait for lgkmcnt(1), and then later for lgkmcnt(0). This should basically always be a win, although theoretically there could be cases where it's very slightly worse due to the increased code size. The worst code size regressions in my shader-db are:

  WORST REGRESSIONS - Code Size Before After     Delta Percentage 1724  1736        12    0.70 %   shaders/private/f1-2015/1334.shader_test [0] 2276  2284         8    0.35 %   shaders/private/f1-2015/1306.shader_test [0] 4632  4640         8    0.17 %   shaders/private/ue4_elemental/62.shader_test [0] 2376  2384         8    0.34 %   shaders/private/f1-2015/1308.shader_test [0] 3284  3292         8    0.24 %   shaders/private/talos_principle/1955.shader_test [0]

  ... so I'm not particularly worried about the rather theoretical downside.

Repository:
  rL LLVM

https://reviews.llvm.org/D54226

Files:
  lib/Target/AMDGPU/SIInsertWaitcnts.cpp
  lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
  lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
  test/CodeGen/AMDGPU/smrd-vccz-bug.ll
  test/CodeGen/AMDGPU/vccz-corrupt-bug-workaround.mir

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D54226.173027.patch
Type: text/x-patch
Size: 28479 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20181107/bb202f14/attachment.bin>