[LLVMdev] Loop Unroll
Sahasrabuddhe, Sameer
sameer.sahasrabuddhe at amd.com
Sun Nov 30 21:32:06 PST 2014
On 11/29/2014 6:52 AM, rcieszew wrote:
> Hello,
> I would like to create VHDL backend for LLVM and now i'm testing
> unroll loop passes. I would like to unroll loop but to parallel form
> (each basic block of unrolled loop has the same parent node). Now i
> can only unrool loop to serial form (each basic block is a parent node
> of another).
> It is possible to unroll loop to parallel form (each basic block of
> onrolled loop has the same parent node in CFG)?
Hello Radoslaw,
As far as I can make out, there is a mismatch between the VHDL-level
picture that you have in mind, and the way a traditional CPU compiler
works. Here, "unroll" simply means "serialize". A basic block with
multiple successors in LLVM has a conditional branch that transfers
control to only one of all the successors. What you have in mind is a
way to transfer control to all successors in parallel. This cannot be
represented in LLVM IR.
The implicit assumption is that there are no dependencies in the loop of
interest, and all iterations can be executed in parallel. There can be
several ways to handle this:
1. Merge all the unrolled basic blocks into one block. Then maybe the
instruction-level parallelism between them will automatically show
up in your VHDL. This is the simplest way to do it.
2. Vectorize the loop body in LLVM, then generate VHDL entities that
can handle vector inputs. This will be limited by the size of
vectors that the LLVM vectorizer can generate. Also your memory
subsystem will need to handle vector load/stores.
3. This last one is purely in the VHDL generator: Somehow mark loops
that can be parallelized, and generate a custom VHDL entity that
captures the loop body. Then instead of generating a loop control
structure in your VHDL, generate a fork/join structure that
transfers control to multiple instances of your entity, one for each
iteration of the loop. This will be limited by the number of
load/store requests that your memory subsystem can accept in parallel.
Sameer.
More information about the llvm-dev
mailing list