[PATCH] D40547: AMDGPU: Fix copying i1 value out of loop with non-uniform exit
Nicolai Hähnle via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Jan 26 08:23:40 PST 2018
nhaehnle added inline comments.
Herald added a subscriber: hintonda.
================
Comment at: lib/Target/AMDGPU/Utils/AMDGPULaneDominator.cpp:13-15
+// 2. whenever B executes, every active lane during that execution of B was
+// also active during the most recent execution of A.
+//
----------------
arsenm wrote:
> Are you really trying to find control flow equivalent regions? Or is this a active lanes are a subset?
They don't have to be equivalent. In the case of:
```
A
/ \
/ \
B C
```
A lane-dominates both B and C, but they're not equivalent.
================
Comment at: lib/Target/AMDGPU/Utils/AMDGPULaneDominator.cpp:70
+
+ return true;
+}
----------------
t-tye wrote:
> Would this return true for the CFG:
>
> B
> |
> |
> v
> A
> |
> |
> v
> end
>
> Since no successors of A reaches A to cause false to be returned. But no check seems to be made to ensure that A traditionally-dominates B.
I'm afraid I'm not sure what the CFG you have in mind looks like. The function assumes that A traditionally-dominates B, the caller must ensure that (as per the comment).
https://reviews.llvm.org/D40547
More information about the llvm-commits
mailing list