[PATCH] D43594: [AMDGPU] Respect pragma unroll when loop contains convergent instructions

Wed Feb 21 14:34:00 PST 2018

yaxunl created this revision.
yaxunl added reviewers: rampitec, arsenm.
Herald added subscribers: t-tye, tpr, dstuttard, nhaehnle, wdng, kzhuravl.

Currently loop unroll is conservative about loops containing convergent instructions.
It does not allow remainder for such loops, which essentially disables unroll count
requested by pragma and results in fully unrolled loop in many cases.

As such a user may specify pragma unroll 32 but instead gets the loop unrolled 512
and results in extremely long compilation time.

For some target, e.g. AMDGPU, the remainder does not cause extra divergence and
should be allowed.

This patch introduces AllowRemainderForConvergentLoop in 
TargetTransformInfo::UnrollingPreferences and allows each target to specify
whether unrolling convergent loop with remainder is allowed. By default it is
false therefore no functional change for other targets.

This patch fixes shmembench-ocl compilation time issue on amdpu.

https://reviews.llvm.org/D43594

Files:
  include/llvm/Analysis/TargetTransformInfo.h
  lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
  lib/Transforms/Scalar/LoopUnrollPass.cpp
  test/Transforms/LoopUnroll/AMDGPU/convergent.ll
  test/Transforms/LoopUnroll/convergent.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D43594.135332.patch
Type: text/x-patch
Size: 3782 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180221/fac17d1d/attachment.bin>