[llvm-dev] Question regarding LICM

Mon Dec 26 21:12:45 PST 2016

Hello,

I am working on a C++ expression templates based DSL where we are using
LLVM for the code generation. I needed some help in understanding the
behaviour of the LICM pass. In the following example code the "A" class
is a custom container that defines various arithmetic operators using
expression templates. We are defining three arrays of the "A" container
and aggregating the result of the multiplication into "lat".

I was attempting to get the expressions "a[i]" and "b[j]" to be hoisted
on top of the "j-loop" and the "k-loop" respectively.

//=== C++ code snippet ===//

1:A<int> a[4] = {A<int>(&ctx),A<int>(&ctx),A<int>(&ctx),A<int>(&ctx)};
2:A<int> b[4] = {A<int>(&ctx),A<int>(&ctx),A<int>(&ctx),A<int>(&ctx)};
3:A<int> c[4] = {A<int>(&ctx),A<int>(&ctx),A<int>(&ctx),A<int>(&ctx)};
5:A<int> lat(&ctx);
6:
7:for(std::size_t i = 0; i < 4; ++i)
8:  for(std::size_t j = 0; j < 4; ++j)
9:    for(std::size_t k = 0; k < 4; ++k) {
10:      lat = a[i] * b[j] *c[k];
11:    }

The IR generated for the body of the innermost loop after inlining
most of the expression template calls and loop simplification is show
below.

If I run LICM on this IR the GEPs in line 1,2 are hoisted into
the preheaders of the "j-loop" and the "k-loop" respectively. I believe
this is so as the operands to the GEP are loop invariant and
*isSafeToExecuteUnconditionally* returns trivially true for the GEP.

However, the CallInst Line 4,6 remain inside the innermost loop as the
*hasLoopInvariantOperands* for the CallInsts returns false as the GEP
operands themselves are not loop invariant.

This is the behaviour I was not sure about and would greatly appreciate
some help in understanding it. And, for LICM to hoist the CallInsts out
how should the code be structured.

//=== Generated IR for innermost loop body ===//

1:  %22 = getelementptr inbounds [4 x %"struct.mdarray_terminal"], [4 x 
%"struct.mdarray_terminal"]* %a, i64 0, i64 %i.0
2:  %23 = getelementptr inbounds [4 x %"struct.mdarray_terminal"], [4 x 
%"struct.mdarray_terminal"]* %b, i64 0, i64 %j.0
3:  %24 = getelementptr inbounds [4 x %"struct.mdarray_terminal"], [4 x 
%"struct.mdarray_terminal"]* %c, i64 0, i64 %k.0
4:  %25 = call i32* @access_fn(%"struct.mdarray_terminal"* %22, i64 0, 
i64 0)
5:  %26 = load i32, i32* %25, !alias.scope !1, !noalias !3
6:  %27 = call i32* @access_fn(%"struct.mdarray_terminal"* %23, i64 0, 
i64 0)
7:  %28 = load i32, i32* %27, !alias.scope !5, !noalias !7
8:  %mkernel = call i32 @mult_op(i32 %26, i32 %28)
9:  %29 = call i32* @access_fn(%"struct.mdarray_terminal"* %24, i64 0, 
i64 0)
10:  %30 = load i32, i32* %29, !alias.scope !6, !noalias !8
11:  %mkernel2 = call i32 @mult_op(i32 %mkernel, i32 %30)
12:  %31 = call i32* @access_fn(%"struct.mdarray_terminal"* %lat, i64 0, 
i64 0)
13:  store i32 %mkernel2, i32* %31, !alias.scope !4, !noalias !9
14:  %32 = add i64 %k.0, 1
15:  br label %19

Best,
Dipto