[polly] scop detection: could we stop iterating over uses?

Tue Jun 4 22:54:24 PDT 2013

On 06/03/2013 09:22 AM, Sebastian Pop wrote:
> Hi,
>
> I see that we call hasScalarDependency over every instruction in a region, and
> that iterates over all the uses of the instruction.  Here is the code of that
> function, with a nice "dirty hack" comment in it:
>
> bool ScopDetection::hasScalarDependency(Instruction &Inst,
>                                          Region &RefRegion) const {
>    for (Instruction::use_iterator UI = Inst.use_begin(), UE = Inst.use_end();
>         UI != UE; ++UI)
>      if (Instruction *Use = dyn_cast<Instruction>(*UI))
>        if (!RefRegion.contains(Use->getParent())) {
>          // DirtyHack 1: PHINode user outside the Scop is not allow, if this
>          // PHINode is induction variable, the scalar to array transform may
>          // break it and introduce a non-indvar PHINode, which is not allow in
>          // Scop.
>          // This can be fix by:
>          // Introduce a IndependentBlockPrepare pass, which translate all
>          // PHINodes not in Scop to array.
>          // The IndependentBlockPrepare pass can also split the entry block of
>          // the function to hold the alloca instruction created by scalar to
>          // array.  and split the exit block of the Scop so the new create load
>          // instruction for escape users will not break other Scops.
>          if (isa<PHINode>(Use))
>            return true;
>        }
>
>    return false;
> }
>
> My question is how do we remove all this code: it is very expensive!

I just checked and there is no test case in the polly test cases that 
would fail if we remove this function. However, I was able to create a 
test case that is similar to the description in the comment.

 From a first view I do not really see why PHI nodes need to be handled 
in a special way. In fact, when removing this function as well as the
two asserts in the IndependentBlocks pass, the attached test case is 
correctly code generated. I can see that there may be some problems with 
invalidating later scops when translating out of SSA, but I do not see 
how this is specific to PHI nodes. So from my point of view, I would
be OK with removing this code (and add the relevant test cases to ensure 
we do the right thing) given such a patch passes the LLVM test suite 
without regressions.

However, maybe ether has some more insights. I remember he was the one 
adding this code in, but I don't remember the exact reasoning behind this.

Cheers,
Tobi
-------------- next part --------------
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128"
target triple = "x86_64-unknown-linux-gnu"

define void @f(i64* %A, i64 %N, i64 %M) nounwind {
entry:
  fence seq_cst
  br label %for.i

for.i:
  %indvar = phi i64 [ 0, %entry ], [ %indvar.next, %for.i ]
  %scevgep = getelementptr i64* %A, i64 %indvar
  store i64 %indvar, i64* %scevgep
  %indvar.next = add nsw i64 %indvar, 1
  %exitcond = icmp eq i64 %indvar.next, %N
  br i1 %exitcond, label %next, label %for.i

next:
  fence seq_cst
  br label %for.j

for.j:
  %indvar.j = phi i64 [ %indvar, %next ], [ %indvar.j.next, %for.j ]
  %scevgep.j = getelementptr i64* %A, i64 %indvar.j
  store i64 %indvar.j, i64* %scevgep.j
  fence seq_cst
  %indvar.j.next = add nsw i64 %indvar.j, 1
  %exitcond.j = icmp eq i64 %indvar.j.next, %M
  br i1 %exitcond.j, label %return, label %for.j

return:
  fence seq_cst
  ret void
}