[PATCH] D43594: [AMDGPU] Respect pragma unroll when loop contains convergent instructions

Tue Feb 27 09:39:23 PST 2018

yaxunl added inline comments.

================
Comment at: test/Transforms/LoopUnroll/convergent.ll:100
+  %exitcond = icmp eq i32 %inc, 4
+  br i1 %exitcond, label %exit, label %l3, !llvm.loop !1
+
----------------
nhaehnle wrote:
> efriedma wrote:
> > I'm not sure this testcase really demonstrates what you want it to demonstrate... a trip count of 4 is divisible by an unroll count of 2, so you don't need a remainder loop anyway.
> Agreed. I have to say, it looks to me like the loop unroll is simply overly conservative here, and should stick with the requested 2x unroll on **all** targets, despite the convergent function call.
> 
> There really shouldn't be a difference between AMDGPU and NVPTX at this point.
It seems there is a separate but related bug in loop unroll: basically if AllowRemainder is disabled, pragma unroll count is not respected even though there is no remainder. I created another patch for that issue https://reviews.llvm.org/D43826

https://reviews.llvm.org/D43594