[PATCH] D54226: AMDGPU/InsertWaitcnts: Untangle some semi-global state
Nicolai Hähnle via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Nov 7 14:15:32 PST 2018
nhaehnle created this revision.
nhaehnle added reviewers: msearles, rampitec, scott.linder, kanarayan.
Herald added subscribers: t-tye, tpr, dstuttard, yaxunl, wdng, jvesely, kzhuravl, arsenm.
Reduce the statefulness of the algorithm in two ways:
1. More clearly split generateWaitcntInstBefore into two phases: the first one which determines the required wait, if any, without changing the ScoreBrackets, and the second one which actually inserts the wait and updates the brackets.
2. Communicate pre-existing s_waitcnt instructions using an argument to generateWaitcntInstBefore instead of through the ScoreBrackets.
To simplify these changes, a Waitcnt structure is introduced which carries
the counts of an s_waitcnt instruction in decoded form.
There are some functional changes:
1. The FIXME for the VCCZ bug workaround was implemented: we only wait for SMEM instructions as required instead of waiting on all counters.
2. There are some cases where we previously merged some waitcnt instructions together non-locally due to the somewhat odd OldWaitcnt tracking, e.g. we would produce code like this:
ds_read_b32 v0, ... ds_read_b32 v1, ... s_waitcnt lgkmcnt(0) <-- this is a merged wait for both uses use(v0) more code use(v1)
In these cases we will now always first emit a wait for lgkmcnt(1), and then later for lgkmcnt(0). This should basically always be a win, although theoretically there could be cases where it's very slightly worse due to the increased code size. The worst code size regressions in my shader-db are:
WORST REGRESSIONS - Code Size Before After Delta Percentage 1724 1736 12 0.70 % shaders/private/f1-2015/1334.shader_test [0] 2276 2284 8 0.35 % shaders/private/f1-2015/1306.shader_test [0] 4632 4640 8 0.17 % shaders/private/ue4_elemental/62.shader_test [0] 2376 2384 8 0.34 % shaders/private/f1-2015/1308.shader_test [0] 3284 3292 8 0.24 % shaders/private/talos_principle/1955.shader_test [0]
... so I'm not particularly worried about the rather theoretical downside.
Repository:
rL LLVM
https://reviews.llvm.org/D54226
Files:
lib/Target/AMDGPU/SIInsertWaitcnts.cpp
lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
test/CodeGen/AMDGPU/smrd-vccz-bug.ll
test/CodeGen/AMDGPU/vccz-corrupt-bug-workaround.mir
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D54226.173027.patch
Type: text/x-patch
Size: 28479 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20181107/bb202f14/attachment.bin>
More information about the llvm-commits
mailing list