[LLVMdev] Hoisting elements of array argument into registers

Fri Nov 5 07:51:15 PDT 2010

Hi Max,

> The Glasgow Haskell Compiler LLVM backend is generating a lot of code which
> appears to be optimised  by LLVM quite poorly. The problem is demonstrated by
> this C source code:

if I run this at -O3 under llvm-gcc from top-of-tree on x86-64 linux then
(1) it computes that g(5) is equal to 95, and main is transformed to just print
95.
(2) g is transformed into code with no loop, i.e. the result is computed
directly from the value of "a" in straight line code (there is a special
case when a is zero).
(3) wf is not transformed as well: values are loaded from memory and stored
back on each loop.  And there is still a loop.

I see the same with clang.  I'm not sure why the optimizers do so much better
when they can see that sp is a local array (the special initial values don't
matter).

Ciao,

Duncan.

>
> int wf(int sp[]) {
>      if (sp[0] == 0) {
>          return sp[2] + sp[3];
>      } else {
>          sp[3] = sp[3] + (sp[1] * 5);
>          sp[2] = (sp[2] + sp[0]) + 1;
>          sp[1] = sp[1] - 1;
>          sp[0] = sp[0] - 1;
>          return wf(sp);
>      }
> }
>
> int g(int a) {
>      int sp[] = { a, a, 0, 0 };
>      return wf(sp);
> }
>
> int main(void) {
>      printf("%d\n", g(5));
>      return 0;
> }
>
> As you can see, wf takes its arguments via a pointer. Nonetheless, we would do
> like to loop  optimisations on wf -- in particular, we would like to eliminate
> the duplicate computations in sp[0] and sp[1] (GHC's own optimiser does not do
> any classical loop optimisations like this).
>
> Unfortunately, LLVM (opt -O3: I tried v2.7 and HEAD) doesn't seem to get very
> far. It eliminates the tail call for wf() correctly, but the arguments are
> carried between each iteration of the loop by storing intothe sp array. It
> would be better if the arguments were kept in registers during the loop and
> only stored  into the array upon exit. (These stores could then be eliminated
> easily since there is no later load from sp).
>
> If I manually rewrite the code to replace the array argument with its composite
> scalars in the following manner LLVM obviously does the right thing:
>
> int wf(int a, int b, int c, int d) {
>      if (a == 0) {
>          return c + d;
>      } else {
>          return wf(a - 1, b - 1, (c + a) + 1, d + (b * 5));
>      }
> }
>
> int g(int a) {
>      return wf(a, a, 0, 0);
> }
>
> int main(void) {
>      printf("%d\n", g(5));
>      return 0;
> }
>
> The question is, is there any way to get LLVM to do this rewriting for us? It
> seems like this would be a generally useful optimisation.
>
> I guess you could also achieve this by sinking the stores to sp into next loop
> iteration/the loop exit,  where they will meet the corresponding loads and so
> good things will happen. The current instruction sinking pass doesn't seem to
> get this case, though.
>
> Thanks in advance for any help!
> Max
>
> p.s. Our real code is a little more complicated because sp is really a pointer
> to the top of the Haskell runtimes stack (which is of unknown length), not a
> 4-element array, which might make the optimisation a bit harder. Perhaps LLVM
> would have to promote just those elements of the input array which were
> accessed in the loop into scalars.
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev