[PATCH] D29473: [AMDGPU] Unroll preferences improvements

Fri Feb 3 10:17:05 PST 2017

rampitec marked 4 inline comments as done.
rampitec added inline comments.

================
Comment at: llvm/trunk/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp:69
+            continue;
+          if (any_of(L->getSubLoops(), [Inst](const Loop* SubLoop) {
+               return SubLoop->contains(Inst); }))
----------------
vpykhtin wrote:
> Why is this check nessesary? Is this an early exit when GEP is dependent on more than 1 induction variable?
This is just a check that real dependency is in the inner loop, which really needs to be unrolled. When you iterate through blocks a a loop you will get those belonging to inner loops as well. I'm just checking we are about to unroll a right one.

================
Comment at: llvm/trunk/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp:88
         // programs way too big.
-        UP.Threshold = 800;
+        UP.Threshold = UnrollThresholdPrivate;
+        return;
----------------
tstellarAMD wrote:
> vpykhtin wrote:
> > tstellarAMD wrote:
> > > Do you also want to set PartialThreshold here?
> > I thought partialy unrolled loops won't make it possible to SROA private arrays. What are the benefits of partial unrolling on AMDGPU btw? What comes in mind: mem ops clustering/widening, less branches? What else?
> I had a test case where bumping the PartialThreshold helped more non-partial loops be unrolled, but I looked at the case again and increasing the normal Threshold has the same affect, so I don't think this is needed.
Actually I agree, partial unroll does not help to SROA an array. There can be other motivation, but not this.

Repository:
  rL LLVM

https://reviews.llvm.org/D29473