[LLVMdev] `llvm.$op.with.overflow`, InstCombine and ScalarEvolution

Thu Mar 26 16:37:30 PDT 2015

> On Mar 26, 2015, at 2:54 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:
> 
> I've run into cases where, because not all of LLVM's optimizations
> understand the semantics of the `llvm.$op.with.overflow` intrinsics,
> canonicalizing compares to `llvm.$op.with.overflow` ends up preventing
> optimization.
> 
> For instance, running the following snippet through `opt -indvars`
> optimizes `%to.optimize` to `true`, but running it through
> `opt -instcombine -indvars` does not.
> 
> ```
> declare void @side_effect(i1)
> 
> define void @foo(i8 %x, i8 %y) {
> entry:
>  %sum = add i8 %x, %y
>  %e = icmp ugt i8 %x, %sum
>  br i1 %e, label %exit, label %loop
> 
> loop:
>  %idx = phi i8 [ %x, %entry ], [ %idx.inc, %loop ]
>  %idx.inc = add i8 %idx, 1
>  %to.optimize = icmp ule i8 %idx, %sum
>  call void @side_effect(i1 %to.optimize)
>  %c = icmp ule i8 %idx.inc, %sum
>  br i1 %c, label %loop, label %exit
> 
> exit:
>  ret void
> }
> ```
> 
> This happens because `-instcombine` does the following tranform:
> 
> ```
> entry:
>  %uadd = call { i8, i1 } @llvm.uadd.with.overflow.i8(i8 %x, i8 %y)
>  %0 = extractvalue { i8, i1 } %uadd, 0
>  %e = extractvalue { i8, i1 } %uadd, 1
>  br i1 %e, label %exit, label %loop.preheader
> ```

That’s new to me. I guess I missed that.

> and ScalarEvolution can no longer see through the `extractvalue` of
> the call to `llvm.uadd.with.overflow.i8`.
> 
> The right way to fix this depends on the intended purpose of the
> `llvm.$op.with.overflow` intrinsics.  Three solutions I can think of:
> 
> * if they're a convenience useful only for better codegen, can the
>   transform that instcombine is doing currently (transforming
>   compares to `extractvalue` of a `.with.overflow` intrinsic) be
>   moved to CodeGenPrep?

The intrinsics tightly couple the operation with the branch, preventing the optimizer from obscuring the relationship, thereby effectively guaranteeing good codegen. If the intrinsics are needed to expose mid-level optimizations, I’m not aware of it. I think mid-level optimization should generally work on the canonical compare-and-branch form.

I don't understand why we would want to generate intrinsics early unless the frontend does it. It seems that if InstCombine can pattern match these cases, then it can just as easily optimize the comparison directly. Isn’t it already doing that?

> * if they're something more fundamental, then maybe they should to be
>   promoted to an instruction?  They've been around since at least
>   llvm 2.6 as far as I can tell.  Personally, I seriously doubt this
>   is a good idea, given that the semantics of these intrinsics can be
>   completely described by a DAG composed of existing instructions.

Right. They are a way for the frontend to communicate directly with codegen, *preventing* optimization from getting in the way. That’s a pretty standard use for intrinsics.

> * add rules to ScalarEvolution to have it understand these intrinsics
>   (or maybe even expand them before `-indvars` -- I think
>   `-liv-reduce` tries to do this in some cases), but I'd vote for
>   keeping this complexity out of ScalarEvolution unless there are
>   good reasons why the `.with.overflow` calls need to be introduced
>   before codegenprep.
> 
> What do you think?

I don’t think ScalarEvolution can possibly represent overflow conditions. -liv-reduce clones the data flow, decoupling the overflow check from the arithmetic, just hoping that ISEL can put it back together! I don’t think anyone actively uses at this point, so basically loop optimization is disabled if you use these intrinsics.

If cloning+expanding the intrinsic (-liv-reduce) is really the best way to expose these to SCEV, then clearly InstCombine should not be fusing them to begin with.

An alternative that avoids cloning+expanding the intrinsic would be to allow SCEV to analyze the expression for the intrinsic’s value result, but avoid replacing the intrinsic with a materialized SCEV expression (we would have no way to replace the overflow result anyway). To do that, I think the current uses of SCEVExpander need to be surveyed to ensure that they will handle lazily cloning the intrinsic when materializing the expression for the intrinsic’s value result. This would be wrong for LSR, where it assumes SCEV knows about all uses of the value, but that could probably be worked around or we could just bail on LSR.

So:
- I don’t understand why InstCombine is materializing the intrinsic
- But it is still a good idea to teach SCEV clients how to handle code with the intrinsic

Andy

> 
> -- Sanjoy