[LLVMdev] Hoisting elements of array argument into registers

Fri Nov 5 05:56:40 PDT 2010

Hi,

The Glasgow Haskell Compiler LLVM backend is generating a lot of code which
appears to be optimised  by LLVM quite poorly. The problem is demonstrated by
this C source code:

int wf(int sp[]) {
    if (sp[0] == 0) {
        return sp[2] + sp[3];
    } else {
        sp[3] = sp[3] + (sp[1] * 5);
        sp[2] = (sp[2] + sp[0]) + 1;
        sp[1] = sp[1] - 1;
        sp[0] = sp[0] - 1;
        return wf(sp);
    }
}

int g(int a) {
    int sp[] = { a, a, 0, 0 };
    return wf(sp);
}

int main(void) {
    printf("%d\n", g(5));
    return 0;
}

As you can see, wf takes its arguments via a pointer. Nonetheless, we would do
like to loop  optimisations on wf -- in particular, we would like to eliminate
the duplicate computations in sp[0] and sp[1] (GHC's own optimiser does not do
any classical loop optimisations like this).

Unfortunately, LLVM (opt -O3: I tried v2.7 and HEAD) doesn't seem to get very
far. It eliminates the tail call for wf() correctly, but the arguments are
carried between each iteration of the loop by storing intothe sp array. It
would be better if the arguments were kept in registers during the loop and
only stored  into the array upon exit. (These stores could then be eliminated
easily since there is no later load from sp).

If I manually rewrite the code to replace the array argument with its composite
scalars in the following manner LLVM obviously does the right thing:

int wf(int a, int b, int c, int d) {
    if (a == 0) {
        return c + d;
    } else {
        return wf(a - 1, b - 1, (c + a) + 1, d + (b * 5));
    }
}

int g(int a) {
    return wf(a, a, 0, 0);
}

int main(void) {
    printf("%d\n", g(5));
    return 0;
}

The question is, is there any way to get LLVM to do this rewriting for us? It
seems like this would be a generally useful optimisation.

I guess you could also achieve this by sinking the stores to sp into next loop
iteration/the loop exit,  where they will meet the corresponding loads and so
good things will happen. The current instruction sinking pass doesn't seem to
get this case, though.

Thanks in advance for any help!
Max

p.s. Our real code is a little more complicated because sp is really a pointer
to the top of the Haskell runtimes stack (which is of unknown length), not a
4-element array, which might make the optimisation a bit harder. Perhaps LLVM
would have to promote just those elements of the input array which were
accessed in the loop into scalars.