[llvm-dev] Approaches for handling branches

Sat Mar 12 05:45:36 PST 2016

Hi,

LLVM newbie here.  I'm working on writing a compiler from a toy language to
LLVM assembly, and hoping to get some advice about how to handle certain
kinds of branches.

Consider an input program that might look like this if written in C:

   x1 = cond1 ? f(a1)
      : cond2 ? f(a2)
      : cond3 ? f(a1)
      :         f(a4);

   x2 = cond3 ? f(a3)
      : cond4 ? f(a1)
      :         f(a4);

   return g(x1, x2);

I'm hoping to compile something like the above in a "good" way that avoids
unnecessarily calling F.

To mock up how this might work, I wrote some LLVM assembly to try to model
the above (full code below: "try1").  I ended up with the following for x1,
where @f is declared as an external function, and @ite is a simple
if-then-else function that seems to nicely get inlined:

   %fa1 = call i64 @f (i64 %a1)
   %fa2 = call i64 @f (i64 %a2)
   %fa4 = call i64 @f (i64 %a4)

   %x1.cond3 = call i64 @ite (i64 %cond3, i64 %fa1, i64 %fa4)
   %x1.cond2 = call i64 @ite (i64 %cond2, i64 %fa2, i64 %x1.cond3)
   %x1       = call i64 @ite (i64 %cond1, i64 %fa1, i64 %x1.cond2)

Unfortunately this seems to produce "bad" x86 code, at least when compiled
like this:

     opt demo.ll -O2 -o=demo.ll.opt && llc -o demo.s demo.ll.opt

In particular, the resulting demo.s appears to be doing all of the calls of F
first, followed by some conditional moves to select the right answer.  (This
would of course be great if F is very fast, but it would be bad if F is
slow.)

I've tried adding some promising looking switches to the opt command (like
-delinearize and -sink) and using different register allocation schemes in
the llc command, but so far I don't seem to be able to get LLVM to reorder
the instructions to push the calls of F into the branches.  (I'm afraid I
don't really know much about what the switches do.)

I seem to be able to do much better by (manually) rewriting the assembly so
that I explicitly check the conditions first, and only call F on the various
arguments in the branches where they are needed (code below: "try2").

So I could write my frontend so that it generates code like try2.  But I
worry about this approach because the code for try2 ends up repeating
computations like the call of f(a1) and f(a4), and it seems like this will
lead to code blowup.  My "real" programs will have much more deeply nested
branches, and the expressions in the branches are going to be bigger than
just a single function call.  So if I try to put these expressions only in
the branches where they are computed, it seems like that will result in code
that grows exponentially in the number of branches that lead to a common
subexpression.

So I have many questions.  I wonder:

  - Are there other command-line options I should look into that might help
    LLVM be able able to optimize "try1" to push the calls down into the
    branches where they are used and avoid the unnecessary calls of F?

  - Is this kind of reordering anything I should expect LLVM to be able to
    take care of in the first place, or should I expect to need to work on
    making sure that my frontend produces code that is closer to what I want
    the machine to execute?

  - Are there other approaches that would work better than try1/try2 that I
    haven't considered?

  - Are there any examples of developing LLVM frontends for languages that
    deal with terms that are rich in branches and structure sharing?  My real
    goal is to build a good compiler for efficiently executing DAGs of 64-bit
    integer operations, where some nodes fan out to several operations, but
    large chunks of the DAG may not be needed depending on the inputs.  I'd
    be very interested in seeing other approaches to this kind of problem.

Thanks very much for your time.

Cheers,

Jared

------------------------------

;; Source code for try1/try2
;;
;; I'm using LLVM on Linux Mint:
;;   opt --version: LLVM version 3.4
;;   llc --version: LLVM version 3.4

declare i64 @f (i64 %a) nounwind readonly
declare i64 @g (i64 %a, i64 %b) nounwind readonly

define i64 @ite (i64 %cond, i64 %then, i64 %else)
{
    ;; If/Then/Else, which will be inlined with -inline
    %cond.zero = icmp eq i64 %cond, 0
    br i1 %cond.zero, label %case.zero, label %case.nonzero
  case.nonzero:
    ret i64 %then
  case.zero:
    ret i64 %else
}

define i64 @try1 (i64* %in)
{
    ;; Arguments from an input array.
    %cond1.addr = getelementptr i64* %in, i32 0
    %cond2.addr = getelementptr i64* %in, i32 1
    %cond3.addr = getelementptr i64* %in, i32 2
    %cond4.addr = getelementptr i64* %in, i32 3
    %a1.addr = getelementptr i64* %in, i32 4
    %a2.addr = getelementptr i64* %in, i32 5
    %a3.addr = getelementptr i64* %in, i32 6
    %a4.addr = getelementptr i64* %in, i32 7

    ;; Load arguments into temporaries
    %cond1 = load i64* %cond1.addr
    %cond2 = load i64* %cond2.addr
    %cond3 = load i64* %cond3.addr
    %cond4 = load i64* %cond3.addr
    %a1 = load i64* %a1.addr
    %a2 = load i64* %a2.addr
    %a3 = load i64* %a3.addr
    %a4 = load i64* %a4.addr

    ;; Compute F(Ai)
    %fa1 = call i64 @f (i64 %a1)
    %fa2 = call i64 @f (i64 %a2)
    %fa3 = call i64 @f (i64 %a3)
    %fa4 = call i64 @f (i64 %a4)

    ; Compute x1 = cond1 ? f(a1)
    ;            : cond2 ? f(a2)
    ;            : cond3 ? f(a1)
    ;            :         f(a4);

    %x1.cond3 = call i64 @ite (i64 %cond3, i64 %fa1, i64 %fa4)
    %x1.cond2 = call i64 @ite (i64 %cond2, i64 %fa2, i64 %x1.cond3)
    %x1     = call i64 @ite (i64 %cond1, i64 %fa1, i64 %x1.cond2)

    ; Compute x2 = cond3 ? f(a3)
    ;            : cond4 ? f(a1)
    ;            :         f(a4);

    %x2.cond4 = call i64 @ite (i64 %cond4, i64 %fa1, i64 %fa4)
    %x2       = call i64 @ite (i64 %cond3, i64 %fa3, i64 %x2.cond4)

    ; Return g(x1, x2);
    %ret = call i64 @g(i64 %x1, i64 %x2)
    ret i64 %ret
}

define i64 @try2 (i64* %in)
{
    ;; Arguments from an input array.
    %cond1.addr = getelementptr i64* %in, i32 0
    %cond2.addr = getelementptr i64* %in, i32 1
    %cond3.addr = getelementptr i64* %in, i32 2
    %cond4.addr = getelementptr i64* %in, i32 3
    %a1.addr = getelementptr i64* %in, i32 4
    %a2.addr = getelementptr i64* %in, i32 5
    %a3.addr = getelementptr i64* %in, i32 6
    %a4.addr = getelementptr i64* %in, i32 7

    ;; Load arguments into temporaries
    %cond1 = load i64* %cond1.addr
    %cond2 = load i64* %cond2.addr
    %cond3 = load i64* %cond3.addr
    %cond4 = load i64* %cond3.addr
    %a1 = load i64* %a1.addr
    %a2 = load i64* %a2.addr
    %a3 = load i64* %a3.addr
    %a4 = load i64* %a4.addr

    ; Compute x1 = cond1 ? f(a1)
    ;            : cond2 ? f(a2)
    ;            : cond3 ? f(a1)
    ;            :         f(a4);

    %cond1.nonzero = icmp ne i64 %cond1, 0
    br i1 %cond1.nonzero, label %x1.cond1.true, label %x1.cond1.false

  x1.cond1.false:
    %cond2.nonzero = icmp ne i64 %cond2, 0
    br i1 %cond2.nonzero, label %x1.cond2.true, label %x1.cond2.false

  x1.cond2.false:
    %cond3.nonzero = icmp ne i64 %cond3, 0
    br i1 %cond3.nonzero, label %x1.cond3.true, label %x1.otherwise

  x1.cond1.true:                           ; cond1 ? f(a1)
    %x1.cond1.ans = call i64 @f (i64 %a1)
    br label %x1.finish

  x1.cond2.true:                           ; cond2 ? f(a2)
    %x1.cond2.ans = call i64 @f (i64 %a2)
    br label %x1.finish

  x1.cond3.true:                           ; cond3 ? f(a1)
    %x1.cond3.ans = call i64 @f (i64 %a1)
    br label %x1.finish

  x1.otherwise:                            ; otherwise f(a4)
    %x1.otherwise.ans = call i64 @f (i64 %a4)
    br label %x1.finish

  x1.finish:
    %x1 = phi i64 [%x1.cond1.ans, %x1.cond1.true],
                  [%x1.cond2.ans, %x1.cond2.true],
                  [%x1.cond3.ans, %x1.cond3.true],
                  [%x1.otherwise.ans, %x1.otherwise]

    ; Compute x2 = cond3 ? f(a3)
    ;            : cond4 ? f(a1)
    ;            :         f(a4);

    %cond3.nonzero_for_x2 = icmp ne i64 %cond3, 0
    br i1 %cond3.nonzero_for_x2, label %x2.cond3.true, label %x2.cond3.false

  x2.cond3.false:
    %cond4.nonzero = icmp ne i64 %cond4, 0
    br i1 %cond4.nonzero, label %x2.cond4.true, label %x2.otherwise

  x2.cond3.true:                            ; cond3 ? f(a3)
    %x2.cond3.ans = call i64 @f (i64 %a3)
    br label %x2.finish

  x2.cond4.true:                            ; cond4 ? f(a1)
    %x2.cond4.ans = call i64 @f (i64 %a1)
    br label %x2.finish

  x2.otherwise:
    %x2.otherwise.ans = call i64 @f (i64 %a4)
    br label %x2.finish

  x2.finish:
    %x2 = phi i64 [%x2.cond3.ans, %x2.cond3.true],
                  [%x2.cond4.ans, %x2.cond4.true],
                  [%x2.otherwise.ans, %x2.otherwise]

    ; Return g(x1, x2);
    %ret = call i64 @g(i64 %x1, i64 %x2)
    ret i64 %ret

}

-- 
Jared C. Davis <jared at cs.utexas.edu>
11410 Windermere Meadows
Austin, TX 78759
http://www.cs.utexas.edu/users/jared/