[LLVMdev] alias set collapse and LICM

Mon Apr 27 14:58:09 PDT 2015

I'm current facing an issue related to AliasSetTracker/LICM: the
transitive closure of the alias sets is materially more conservative
than individual aliasing queries and this conservatism leads to
generally worse optimization in LICM.

For instance, consider this module:

declare i32 @only_reads() readonly
declare void @escape(i32*, i32*, i32*)

define void @f(i32 %count) {
 entry:
  %a = alloca i32
  %b = alloca i32
  %c = alloca i32
  call void @escape(i32* %a, i32* %b, i32* %c)
  %enter = icmp sle i32 0, %count
  br i1 %enter, label %loop, label %exit

 loop:
  %idx = phi i32 [ 0, %entry ], [ %idx.inc, %loop ]
  %idx.inc = add i32 %idx, 1
  %take.be = icmp slt i32 %idx, %count

  %a.load = load i32, i32* %a
  store i32 %a.load, i32* %b

  %v = call i32 @only_reads()
  store i32 %v, i32* %c

  br i1 %take.be, label %loop, label %exit

 exit:
  ret void
}

BasicAA knows that %a, %b and %c are all pairwise NoAlias, and given
that, LICM should be able to hoist `%a.load` out of the loop.  That
does not happen because the read only call to `@only_reads` collapses
the alias sets for %a, %b and %c into one and so as far as LICM is
concerned, they all alias each other.  The store to `%b` now
"clobbers" `%a.load`.

One potential fix is to do N^2 alias queries (every mem operation in
the loop with every other mem operation) before hoisting a load, but I
suspect AliasSetTracker exists mainly to make LICM faster than N^2, so
I'm ruling this solution out.

The solution I like: have LICM create two AliasSetTracker
objects.  The first one tracks all memory operations, as the alias set
tracker object does now.  The second one only tracks mods.  When
hoisting a load we check if any alias set in the mod tracker aliases
the load location -- if not, the load is okay to hoist to the
preheader from the data dependence point of view.  In the worst case,
this is 2x more (than now) work for building the alias sets and no more
extra work when checking if we can hoist a load.

Before I dive into implementing this, I have a few questions:

 * is this even the right problem to solve?  Or should I look at other
   passes for this optimization?

 * do you think this will be too expensive?  If so, does it make sense
   to do this only on -O3?`

-- Sanjoy