[llvm-dev] Generating a loop from llvm bitcode

Sun Aug 12 03:20:10 PDT 2018

On 7 August 2018 at 17:51, Ejjeh, Adel via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> Hello
>
> I am developing a backend that generates OpenCL from llvm bitcode. I start
> with a revived fork of the original LLVM C-Backend and have been modifying
> it. One thing that the backend lacked was generating proper loop structures;
> instead it relied on labels and goto statements. Therefore, I am trying to
> find a way to identify the loop structure and print it out in the backend.
> So far, the main issue I’ve been having trouble with is identifying the loop
> induction variable in a straight forward manner;
> getCanonicalInductionVariable doesn’t always succeed, and I’ve tried using
> InductionDescriptor, but it fails in cases where the CFG doesn’t include a
> loop preheader. Also, in both cases, I couldn’t find a direct way of
> determining the loop exit condition. I am wondering if there are any passes
> or data structures that would be able to help me accomplish what I am trying
> to do that I might be missing? Or does the community have any other ideas as
> to how I should approach this issue?
>
> Any input is much appreciated.
>

Hi,

What you are trying to do is essentially very similar to a decompiler.
e.g. https://github.com/jcdutton/libbeauty
A compiler does the following steps:
OpenCL -> AST -> LLVM IR -> Machine Instructions -> Binary
A Decompiler is:
Binary -> Machine Instruction -> LLVM IR -> AST -> OpenCL

The step you are trying is going from LLVM IR -> AST -> OpenCL.
The LLVM IR can be thought of as the CFG. (Control Flow Graph)
The AST is the Abstract Syntax Tree.
The step from LLVM IR -> AST is generally referred to as “control-flow
graph structuring”.
This is when you take a CFG (if ... then ... goto)  to a more
structured representation such as for...loop,  do...while.

I am currently writing my own decompiler. https://github.com/jcdutton/libbeauty
I tried the AST step some time ago. It is difficult to do. So, for
now, I am concentrating on Binary -> LLVM IR. I will do the AST step
later.
But, during my AST work, I discovered patterns of CFG that are just
not describable in any higher level structure even though the
CLANG/LLVM compiler probably produced that CFG after some
optimizations.
So, when I revisit the AST step, I am going to look at which LLVM
optimizations produced those strange CFG patterns, and write LLVM IR
passes that "undo" them and thus create a CFG that is more easily
converted in the AST structure.
One trick to the AST step is first identifying the "Entry Point" to
your function.
Then identifying the "Exit Point" of your function.
If there are multiple "Exit Points", Modify the LLVM IR to merge them
all. E.g. Adding "goto exit:"
This merging, makes it a lot easier for the graph algorithms to do
their task better.
Then use simple graph algorithms, control flow path analysis, to
arrange all the basic blocks in a sensible order for the graph.
E.g. Entry point at the top, and Exit point at the bottom of the page.
In my "libbeauty" decompiler, I use xdot and graphviz to output pretty
graph pictures of the CFG.
It is very easy to spot loops using this method, and from that
selecting for...loop or do...while structures.
My CFG graph has nodes, and different colored links between them.
Green=Branch-due-to-True-Forwards
Red=Branch-due-to-False-Forwards
Yellow= Branch-Backwards.

If you have not done already, read up on graph theory.
You might also need to write some of your own LLVM IR passes, to
generally translate the LLVM IR into a form more suited for
structuring and raising to OpenCL.

Kind Regards

James