[LLVMdev] InstCombine adds bit masks, confuses self, others

Tue Apr 17 06:35:44 PDT 2012

> I really dislike hasOneUse-based "solutions" at the IR / InstCombine layer.
> They result in strange artifacts during optimization: places where adding
> similar code turns off optimizations because we fold the similar bits
> together and reuse parts of the computation.
>
> I would much rather see us devise a reasonable set of canonicalization rules
> at the IR level that don't block optimization. Tricks like saving
> instructions, using bit masks, etc., should all be reserved for the codegen
> layer / DAGCombine. There, hasOneUse makes perfect sense because you're
> trying to peephole through architectural gymnastics, not trying to get
> systematic scalar optimizations to fire.
>
> Let's see if we have any evidence that reserving this transform for the
> DAGCombiner hurts things, how it hurts things, and what we can do about it
> first?

<day dreaming>
In general I think I agree. Putting the subexpression in an inline
function for example would cause the hasOneUse hack to fail as the
multiple uses would only show up after inlining. To be general we
would have to be able to combine the other way any transformation we
refuse to do because it has more than one use.

I also agree that the pipeline should first make the IL canonical, and
then decide what is the best way to compute the canonical form. The
one thing I find unfortunate is that in LLVM the second step is almost
all in DAG and MI levels. There is no point were we have IL passes
that known more about the target.
</day dreaming>

Looking a bit more at this particular case, I think the problem is not
so much the representation we have chosen (with the masks) or that the
DAG combiner is better at handling it. It is just that the DAG
combiner is a second pass. Running the example in opt again produces

define i32 @f(i32 %a, i32* nocapture %p) nounwind uwtable ssp {
entry:
  %div = lshr i32 %a, 2
  store i32 %div, i32* %p, align 4
  %0 = lshr i32 %a, 1
  %add = and i32 %0, 2147483646
  %arrayidx1 = getelementptr inbounds i32* %p, i64 1
  store i32 %add, i32* %arrayidx1, align 4
  ret i32 0
}

Which I realize is orthogonal to using the mask being a good idea or
not as it could be even simpler by some metric by reusing the %div,
but does show that the IL optimizers are stopping while there is more
they can do.

OK. In the end I think I agree we should try it out and decide on the
canonical representation first.

Cheers,
Rafael