[llvm-dev] A query language for LLVM IR (XPath)

Alessandro Di Federico via llvm-dev llvm-dev at lists.llvm.org
Sun Oct 29 06:49:26 PDT 2017


Hi, sometimes when dealing with LLVM IR getting to a desired point of
the code is a bit cumbersome, in particular if you're instrumenting
existing code. A lot of nested loops and if checks.

Maybe all of this could be avoided by employing a query language. Since
an LLVM module can be seen as a sort of tree with attributes, I think
that reusing an existing query language for XML would be appropriate.

In particular I choose XPath [1] since it's more expressive than, say,
CSS selectors (e.g., you can move from the current element to the
parent).

Therefore, in a spare night, I took pugixml [2], a lightweight XML parser
with XPath support, stripped away everything was XML-specific and
adapted it so that it could query an arbitrary tree, as long as a class
providing certain traits is provided.

Attached you can find the class to query a LLVM module and example LLVM
module (using LLVM 3.8, but newer versions should do to).

The current implementation pretends that a module looks like the
following XML tree (more or less):

    <main.ll>
      <main>
        <basicblock1>
          <alloca />
          <alloca />
          ...
        </basicblock1>
        ...
      </main>
    </main.ll>

Additional information could be encoded in attributes.
Please note that the queries are done on the LLVM IR directly, no XML
tree is materialized.

In the following you can find some examples:

    $ # Find all the basic blocks containing at least an alloca
    $ llvm-xpath '/main/*[count(alloca) > 0]' main.ll

      %1 = alloca i32, align 4
      %2 = alloca i32, align 4
      %i = alloca i32, align 4
      store i32 0, i32* %1, align 4
      store i32 %argc, i32* %2, align 4
      %3 = load i32, i32* %2, align 4
      store i32 %3, i32* %i, align 4
      br label %4

    $ # Find all store instructions
    $ llvm-xpath '/*/*/store'
      store i32 0, i32* %1, align 4
      store i32 %argc, i32* %2, align 4
      store i32 %3, i32* %i, align 4
      store i32 %6, i32* %i, align 4

Obviously this doesn't have to be exclusively a command line tool, but
we could have something like:

    for (auto *Store : TheModule.xpath<StoreInst>("/*/*/store"))
      /* ... */

I'm not releasing the full code yet since it's very much work in
progress, but if anyone is interested in such a thing, just ping me.
The applications could range from using it in existing code to just
provide it for fast prototyping, e.g., in llvmcpy [3].

Obviously there are some open questions, such as how to deal with
operands, which could lead to an infinite tree, or how to organize
attributes. But it should be doable.

---
Alessandro Di Federico
PhD student at Politecnico di Milano

[1] https://en.wikipedia.org/wiki/XPath
[2] https://pugixml.org/
[3] https://github.com/revng/llvmcpy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: main.ll
Type: application/octet-stream
Size: 1274 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171029/c913b314/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: llvm-node.cpp
Type: text/x-c++src
Size: 5968 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171029/c913b314/attachment.cpp>


More information about the llvm-dev mailing list