[PATCH] D43594: [AMDGPU] Respect pragma unroll when loop contains convergent instructions

Thu Feb 22 15:58:49 PST 2018

nhaehnle requested changes to this revision.
nhaehnle added inline comments.
This revision now requires changes to proceed.

================
Comment at: test/Transforms/LoopUnroll/convergent.ll:100
+  %exitcond = icmp eq i32 %inc, 4
+  br i1 %exitcond, label %exit, label %l3, !llvm.loop !1
+
----------------
efriedma wrote:
> I'm not sure this testcase really demonstrates what you want it to demonstrate... a trip count of 4 is divisible by an unroll count of 2, so you don't need a remainder loop anyway.
Agreed. I have to say, it looks to me like the loop unroll is simply overly conservative here, and should stick with the requested 2x unroll on **all** targets, despite the convergent function call.

There really shouldn't be a difference between AMDGPU and NVPTX at this point.

https://reviews.llvm.org/D43594