[PATCH] D120544: [AMDGPU] Omit unnecessary waitcnt before barriers

Matt Arsenault via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Apr 18 13:11:58 PDT 2022


arsenm added a comment.

In D120544#3456333 <https://reviews.llvm.org/D120544#3456333>, @mceier wrote:

> This revision causes gpu hangups (Radeon 5700XT) on linux in many opengl apps, for example for godot I get something like this in dmesg:
>
>   [141585.653283] [drm:amdgpu_dm_commit_planes] *ERROR* Waiting for fences timed out!
>   [141587.702251] [drm:amdgpu_dm_commit_planes] *ERROR* Waiting for fences timed out!
>   [141590.783147] [drm:amdgpu_job_timedout] *ERROR* ring gfx_0.0.0 timeout, signaled seq=13399090, emitted seq=13399091
>   [141590.783726] [drm:amdgpu_job_timedout] *ERROR* Process information: process godot.x11.opt.t pid 527100 thread godot.x11.:cs0 pid 527120
>   [141590.783991] amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
>   [141591.148991] amdgpu 0000:03:00.0: [drm:amdgpu_ring_test_helper] *ERROR* ring kiq_2.1.0 test failed (-110)
>   [141591.149000] [drm:gfx_v10_0_hw_fini] *ERROR* KGQ disable failed
>   [141591.334553] amdgpu 0000:03:00.0: [drm:amdgpu_ring_test_helper] *ERROR* ring kiq_2.1.0 test failed (-110)
>   [141591.334559] [drm:gfx_v10_0_hw_fini] *ERROR* KCQ disable failed
>   [141591.521657] [drm:gfx_v10_0_cp_gfx_enable.isra.0] *ERROR* failed to halt cp gfx
>   [141591.528453] [drm] free PSP TMR buffer
>   [141591.560524] CPU: 0 PID: 492277 Comm: kworker/u8:1 Not tainted 5.18.0-rc2-x86_64+ #1
>   [141591.560531] Hardware name: Gigabyte Technology Co., Ltd. Z170X-Gaming 7/Z170X-Gaming 7, BIOS F22j 01/11/2018
>   [141591.560534] Workqueue: amdgpu-reset-dev drm_sched_job_timedout
>   [141591.560540] Call Trace:
>   [141591.560541]  <TASK>
>   [141591.560543]  show_stack+0x52/0x58
>   [141591.560548]  dump_stack_lvl+0x49/0x5e
>   [141591.560551]  dump_stack+0x10/0x12
>   [141591.560553]  amdgpu_do_asic_reset+0x24/0x43b
>   [141591.560557]  amdgpu_device_gpu_recover_imp.cold+0x660/0x755
>   [141591.560560]  amdgpu_job_timedout+0x14f/0x180
>   [141591.560563]  ? set_next_entity+0xe1/0x160
>   [141591.560567]  drm_sched_job_timedout+0x6d/0x100
>   [141591.560569]  ? trace_hardirqs_on+0x37/0xf0
>   [141591.560573]  process_one_work+0x216/0x3f0
>   [141591.560576]  worker_thread+0x50/0x3e0
>   [141591.560578]  ? rescuer_thread+0x380/0x380
>   [141591.560580]  kthread+0xfc/0x120
>   [141591.560583]  ? kthread_complete_and_exit+0x20/0x20
>   [141591.560586]  ret_from_fork+0x22/0x30
>   [141591.560590]  </TASK>
>   [141591.560788] amdgpu 0000:03:00.0: amdgpu: BACO reset
>   [141593.710773] amdgpu 0000:03:00.0: amdgpu: GPU reset succeeded, trying to resume
>   ...
>
> git bisect log:
>
>   # bad: [6730b44480fcce18bfbbae0c46719250e9eae425] [HIP] Fix HIP include path
>   # good: [9b740c035c8b3e1a15effbfa73960fea259fd27d] Update normalizeAffineFor to canonicalize maps/operands before using them
>   git bisect start '--no-checkout' '6730b44480fcce18bfbbae0c46719250e9eae425' '9b740c035c8b3e1a15effbfa73960fea259fd27d' '--' 'llvm' 'cmake' 'third-party'
>   # bad: [9119eefe5fe513f188f8a2b8d5346ac0ce72d3f3] [X86] Add cheapX86FSETCC_SSE helper. NFC.
>   git bisect bad 9119eefe5fe513f188f8a2b8d5346ac0ce72d3f3
>   # bad: [f46fa4de4a95329b74ba0bb4ce9623b64ad84876] Revert "[clang][debug] port clang-cl /JMC flag to ELF"
>   git bisect bad f46fa4de4a95329b74ba0bb4ce9623b64ad84876
>   # bad: [0c2b43ab8cb1067dd1c7899094b824890803a7d2] [X86] Fix MCSymbolizer interface for X86Disassembler
>   git bisect bad 0c2b43ab8cb1067dd1c7899094b824890803a7d2
>   # bad: [e1069c1288d151f36f37c4c616b78b7b0a1e3a50] [AMDGPU] Ensure return address is save/restored if clobbered or when function has calls
>   git bisect bad e1069c1288d151f36f37c4c616b78b7b0a1e3a50
>   # good: [eadd1668d05df582281061460089562cfb6b3a90] update_analyze_test_checks.py: fix UTC_ARGS handling
>   git bisect good eadd1668d05df582281061460089562cfb6b3a90
>   # good: [20c4664552e2e5c1d85db13d3568b7d4a3e843ef] [gn build] Port 205557c908ff
>   git bisect good 20c4664552e2e5c1d85db13d3568b7d4a3e843ef
>   # bad: [8d0c34fd4fb66ea0d19563154a59658e4b7f35d4] [AMDGPU] Omit unnecessary waitcnt before barriers
>   git bisect bad 8d0c34fd4fb66ea0d19563154a59658e4b7f35d4
>   # first bad commit: [8d0c34fd4fb66ea0d19563154a59658e4b7f35d4] [AMDGPU] Omit unnecessary waitcnt before barriers
>
> After reverting 8d0c34fd4fb66ea0d19563154a59658e4b7f35d4 <https://reviews.llvm.org/rG8d0c34fd4fb66ea0d19563154a59658e4b7f35d4> on 4ffd0b6fde4da9a3ba4ee3a189504ce84c118b1c <https://reviews.llvm.org/rG4ffd0b6fde4da9a3ba4ee3a189504ce84c118b1c> the gpu hangups disapear.
>
> Steps to reproduce:
>
> 1. Start godot
> 2. Open project / create new opengl es project
> 3. Hangup happens either immediately or after few seconds of moving mouse

Can you post the IR and ISA for the failing shader?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120544/new/

https://reviews.llvm.org/D120544



More information about the llvm-commits mailing list