[PATCH] D84779: [AMDGPU] Add amdgpu specific loop threshold metadata

Mon Aug 10 05:21:36 PDT 2020

timcorringham added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp:123
+  if (MDNode *AmdgpuThreshold =
+          findOptionMDForLoop(L, "amdgpu.loop.unroll.threshold")) {
+    if (AmdgpuThreshold->getNumOperands() == 2) {
----------------
arsenm wrote:
> I have 2 concerns about this. First there's nothing AMDGPU specific about this. Second, the unroll threshold interpretation isn't really a stable concept. What happens if we adjust the cost model and now all the programs using this metadata change?
> 
> Why is directly setting the threshold this way useful compared to setting a target unroll factor?
You are correct that this isn't intrinsically AMDGPU specific, although it is something that so far hasn't been required by any other target. As it is used by the target specific unroll options rather than in generic code it isn't actually useful for any other target unless they are changed to use it, so I made it target specific in order to avoid cluttering the generic llvm.loop metadata with something that is only used by one target. I'm happy to make it llvm.loop.unroll.threshold if that is acceptable to everyone. 

A target unroll factor can be better (and that is still available) - although it requires some analysis by the front end to determine what value is appropriate. Our front end used to do this, which didn't always work out well, and also increased compile time significantly. Increasing the default threshold (using the function attribute) produced better overall results.  We have now determined that we benefit from unrolling some DontUnroll loops, but that the amount depends on several factors. Again, that either need some front end analysis, or a low threshold for those loops - which gives a worthwhile gain overall for little compile time effort.

When the unroll factor isn't forced (unroll.disable, unroll.full, or unroll.count) then any cost model changes (or any other change that affects the threshold or loop size) can result in a change to the unroll behavior. That can be appropriate, just as a fixed unroll factor may not be with the same changes. I don't think there is a clear correct answer, just a compromise.      

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D84779/new/

https://reviews.llvm.org/D84779