[PATCH] D29473: [AMDGPU] Unroll preferences improvements

Stanislav Mekhanoshin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Feb 2 16:14:13 PST 2017

rampitec marked 2 inline comments as done.
rampitec added inline comments.

Comment at: lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp:32-35
+static cl::opt<unsigned> UnrollThreshold(
+  "amdgpu-unroll-threshold",
+  cl::desc("Unroll threshold for AMDGPU"),
+  cl::init(300), cl::Hidden); // Twice the default.
arsenm wrote:
> I don't think we need this. The regular option I think overrides the target preference if set
Right, it does.

Comment at: lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp:64-65
       if (Alloca) {
+        if (L->hasLoopInvariantOperands(GEP))
+          continue;
+        Type *Ty = Alloca->getAllocatedType();
arsenm wrote:
> Won't LICM move this out of the loop before unrolling?
In most cases it will, but consider this:

  for (int i=0; i < 16; i++)
    for (int j=0; j < 100; j++)
      arr[(j + gid) % 64] = a[i];
Here arr[j] depends on the induction variable of inner loop, so it is not hoisted. We do not want to unroll outer loop because of that though.
Actually this check is not sufficient, so I have rewritten this place. I will update review shortly.

Comment at: lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp:93-96
+  if (ST->getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS)
     return 256;
+  else if (ST->getGeneration() >= AMDGPUSubtarget::SOUTHERN_ISLANDS)
+    return 128;
arsenm wrote:
> I don't think we should change this away from the hardware sizes. I think this hook is only used by the vectorizers we don't use. We should define a different constant for use for the alloca heuristic
The numbers here were just incorrect. SI to CI have 128 registers. Then it makes sense to take into consideration real register file size, which is target dependent. As a todo we need to limit it further if we have occupancy attributes.



More information about the llvm-commits mailing list