[PATCH] D68873: [AMDGPU] Amend target loop unroll defaults

Thu Oct 17 09:08:55 PDT 2019

rampitec added a comment.

I disagree to the idea of having different thresholds based on the runtime. A runtime has nothing to do with it. For example compute can work on top of ROCm or PAL. Can you justify different results for the same programs?

I understand that you have some code or codes which benefit from a specific threshold. I suggest you to analyze these codes, understand and explain the performance gain root cause, then create a new heuristic in this function. That is why this function exists in the first place. It will also allow you to create a testcase.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp:62

 static cl::opt<unsigned> UnrollThresholdLocal(
   "amdgpu-unroll-threshold-local",
----------------
timcorringham wrote:
> nhaehnle wrote:
> > rampitec wrote:
> > > This change penalizes loops which should have unroll boosted instead. Your new default thresholds are now higher than boosted.
> > I see now change here. Is something weird going on with the diff?
> I now initialise ThresholdLocal to be the max of UnrollThresholdLocal and UP.Threshold., so the value used will only be increased for PAL.
It still does not make sense. You are initializing general threshold higher (1100) than boosted (1000).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68873/new/

https://reviews.llvm.org/D68873