[PATCH] D120544: [AMDGPU] Omit unnecessary waitcnt before barriers

Mariusz Ceier via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Apr 18 02:25:24 PDT 2022


mceier added a comment.
Herald added a subscriber: hsmhsm.

This revision causes gpu hangups (Radeon 5700XT) on linux in many opengl apps, for example for godot I get something like this in dmesg:

  [141585.653283] [drm:amdgpu_dm_commit_planes] *ERROR* Waiting for fences timed out!
  [141587.702251] [drm:amdgpu_dm_commit_planes] *ERROR* Waiting for fences timed out!
  [141590.783147] [drm:amdgpu_job_timedout] *ERROR* ring gfx_0.0.0 timeout, signaled seq=13399090, emitted seq=13399091
  [141590.783726] [drm:amdgpu_job_timedout] *ERROR* Process information: process godot.x11.opt.t pid 527100 thread godot.x11.:cs0 pid 527120
  [141590.783991] amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
  [141591.148991] amdgpu 0000:03:00.0: [drm:amdgpu_ring_test_helper] *ERROR* ring kiq_2.1.0 test failed (-110)
  [141591.149000] [drm:gfx_v10_0_hw_fini] *ERROR* KGQ disable failed
  [141591.334553] amdgpu 0000:03:00.0: [drm:amdgpu_ring_test_helper] *ERROR* ring kiq_2.1.0 test failed (-110)
  [141591.334559] [drm:gfx_v10_0_hw_fini] *ERROR* KCQ disable failed
  [141591.521657] [drm:gfx_v10_0_cp_gfx_enable.isra.0] *ERROR* failed to halt cp gfx
  [141591.528453] [drm] free PSP TMR buffer
  [141591.560524] CPU: 0 PID: 492277 Comm: kworker/u8:1 Not tainted 5.18.0-rc2-x86_64+ #1
  [141591.560531] Hardware name: Gigabyte Technology Co., Ltd. Z170X-Gaming 7/Z170X-Gaming 7, BIOS F22j 01/11/2018
  [141591.560534] Workqueue: amdgpu-reset-dev drm_sched_job_timedout
  [141591.560540] Call Trace:
  [141591.560541]  <TASK>
  [141591.560543]  show_stack+0x52/0x58
  [141591.560548]  dump_stack_lvl+0x49/0x5e
  [141591.560551]  dump_stack+0x10/0x12
  [141591.560553]  amdgpu_do_asic_reset+0x24/0x43b
  [141591.560557]  amdgpu_device_gpu_recover_imp.cold+0x660/0x755
  [141591.560560]  amdgpu_job_timedout+0x14f/0x180
  [141591.560563]  ? set_next_entity+0xe1/0x160
  [141591.560567]  drm_sched_job_timedout+0x6d/0x100
  [141591.560569]  ? trace_hardirqs_on+0x37/0xf0
  [141591.560573]  process_one_work+0x216/0x3f0
  [141591.560576]  worker_thread+0x50/0x3e0
  [141591.560578]  ? rescuer_thread+0x380/0x380
  [141591.560580]  kthread+0xfc/0x120
  [141591.560583]  ? kthread_complete_and_exit+0x20/0x20
  [141591.560586]  ret_from_fork+0x22/0x30
  [141591.560590]  </TASK>
  [141591.560788] amdgpu 0000:03:00.0: amdgpu: BACO reset
  [141593.710773] amdgpu 0000:03:00.0: amdgpu: GPU reset succeeded, trying to resume
  ...

git bisect log:

  # bad: [6730b44480fcce18bfbbae0c46719250e9eae425] [HIP] Fix HIP include path
  # good: [9b740c035c8b3e1a15effbfa73960fea259fd27d] Update normalizeAffineFor to canonicalize maps/operands before using them
  git bisect start '--no-checkout' '6730b44480fcce18bfbbae0c46719250e9eae425' '9b740c035c8b3e1a15effbfa73960fea259fd27d' '--' 'llvm' 'cmake' 'third-party'
  # bad: [9119eefe5fe513f188f8a2b8d5346ac0ce72d3f3] [X86] Add cheapX86FSETCC_SSE helper. NFC.
  git bisect bad 9119eefe5fe513f188f8a2b8d5346ac0ce72d3f3
  # bad: [f46fa4de4a95329b74ba0bb4ce9623b64ad84876] Revert "[clang][debug] port clang-cl /JMC flag to ELF"
  git bisect bad f46fa4de4a95329b74ba0bb4ce9623b64ad84876
  # bad: [0c2b43ab8cb1067dd1c7899094b824890803a7d2] [X86] Fix MCSymbolizer interface for X86Disassembler
  git bisect bad 0c2b43ab8cb1067dd1c7899094b824890803a7d2
  # bad: [e1069c1288d151f36f37c4c616b78b7b0a1e3a50] [AMDGPU] Ensure return address is save/restored if clobbered or when function has calls
  git bisect bad e1069c1288d151f36f37c4c616b78b7b0a1e3a50
  # good: [eadd1668d05df582281061460089562cfb6b3a90] update_analyze_test_checks.py: fix UTC_ARGS handling
  git bisect good eadd1668d05df582281061460089562cfb6b3a90
  # good: [20c4664552e2e5c1d85db13d3568b7d4a3e843ef] [gn build] Port 205557c908ff
  git bisect good 20c4664552e2e5c1d85db13d3568b7d4a3e843ef
  # bad: [8d0c34fd4fb66ea0d19563154a59658e4b7f35d4] [AMDGPU] Omit unnecessary waitcnt before barriers
  git bisect bad 8d0c34fd4fb66ea0d19563154a59658e4b7f35d4
  # first bad commit: [8d0c34fd4fb66ea0d19563154a59658e4b7f35d4] [AMDGPU] Omit unnecessary waitcnt before barriers

After reverting 8d0c34fd4fb66ea0d19563154a59658e4b7f35d4 <https://reviews.llvm.org/rG8d0c34fd4fb66ea0d19563154a59658e4b7f35d4> on 4ffd0b6fde4da9a3ba4ee3a189504ce84c118b1c <https://reviews.llvm.org/rG4ffd0b6fde4da9a3ba4ee3a189504ce84c118b1c> the gpu hangups disapear.

Steps to reproduce:

1. Start godot
2. Open project / create new opengl es project
3. Hangup happens either immediately or after few seconds of moving mouse


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120544/new/

https://reviews.llvm.org/D120544



More information about the llvm-commits mailing list