[PATCH] D65088: [AMDGPU][RFC] New llvm.amdgcn.ballot intrinsic

Sebastian Neubauer via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Mar 18 09:47:31 PDT 2020


Flakebi marked an inline comment as done.
Flakebi added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:8684
+
+      return DCI.DAG.getCopyFromReg(DCI.DAG.getEntryNode(), SDLoc(N), Exec, VT);
+    }
----------------
arsenm wrote:
> I didn't think of it before, but reading exec this way could potentially be dangerous due to the fact that we have the terrible operations that modify exec in the middle of IR blocks, and we split them later. We might have to do this fold later
What are these operations and how can I fix them?
And shouldn’t this work as the instruction reads exec and thus should not be touched?


================
Comment at: llvm/lib/Target/AMDGPU/SIInstructions.td:428
+  (i64 (int_amdgcn_ballot i1:$src)),
+  (S_AND_B64 (i64 (COPY_TO_REGCLASS $src, SReg_64)), (i64 EXEC))
+>;
----------------
arsenm wrote:
> Is the COPY_TO_REGCLASS just a tablegen workaround?
Yes, we get an i1 as input though we know that it is stored an sgpr (pair), so we "cast" it into one. The same happens in line 803 to optimize icmp:
```
def : Pat <
  (i64 (int_amdgcn_icmp i1:$src, (i1 0), (i32 33))),
  (COPY $src) // Return the SGPRs representing i1 src
>;
```


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D65088/new/

https://reviews.llvm.org/D65088





More information about the llvm-commits mailing list