[PATCH] D69661: [AMDGPU] Fix vccz after v_readlane/v_readfirstlane to vcc_lo/hi

Jay Foad via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Oct 31 08:37:35 PDT 2019


foad marked 2 inline comments as done.
foad added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1406
+
+    if (ST->getGeneration() <= AMDGPUSubtarget::GFX9) {
+      // Up to gfx9, writes to vcc_lo and vcc_hi don't update vccz.
----------------
arsenm wrote:
> I thought we already had a vccz bug subtarget feature check? Either way I want to limit getGeneration checks and keep them in a subtarget check
Yes we have the hasReadVCCZBug feature mentioned a few lines above, but as I understand it that is a different and very specific bug.

I'm happy to add a new feature for this if necessary but I'm not sure what to call it. I'm not even sure if it's a bug or a feature. The best explanation I've heard so far is that it's just an artifact of the way vccz is implemented in the hardware, and that it doesn't affect gfx10 because the hardware implementation changed to support wave32. So maybe it should be under a "hasWave32Support" feature?


================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1409
+      // Writes to vcc will fix it. Only examine explicit defs.
+      for (auto &Op : Inst.defs()) {
+        switch (Op.getReg()) {
----------------
arsenm wrote:
> This won't catch the implicit def in inline asm for example
I specifically wanted to avoid treating this instruction (from the test case) as a write to vcc, despite its implicit-def:
```
$vcc_hi = V_READFIRSTLANE_B32 killed $vgpr0, implicit $exec, implicit-def $vcc
```
What kind of inline asm are you thinking of?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69661/new/

https://reviews.llvm.org/D69661





More information about the llvm-commits mailing list