[LLVMdev] BNF for IL/IR interpreter

Thu Apr 9 11:37:04 PDT 2015

This might be a very beginner question, but I'm looking for an example for
something that I have never done.

Suppose that I wanted to express actions with respect to lifted semantics
of CPU instructions to an intermediate representation, BAP IL or LLVM IR.
How might I go about providing a Backus Naur Form specification and then
dynamically interpreting those lifted instructions by also specifying
actions to be done with any given IL/IR primitive? I'm looking for any
library that allows me to express BNF terms and actions on them.

Like, say I convert push ebp to Bap IL (here's a json representation from
their live development branch):

{
  "move": {
    "lvar": { "name": "t", "id": 107, "typ": { "imm": 64 } },
    "rexp": { "var": { "name": "RBP", "id": 30, "typ": { "imm": 64 } } }
  }
},
{
  "move": {
    "lvar": { "name": "RSP", "id": 32, "typ": { "imm": 64 } },
    "rexp": {
      "binop": {
        "op": "minus",
        "lexp": {
          "var": { "name": "RSP", "id": 32, "typ": { "imm": 64 } }
        },
        "rexp": { "inte": { "int": "MHg4OjY0" } }
      }
    }
  }
},
{
  "move": {
    "lvar": {
      "name": "mem64",
      "id": 58,
      "typ": {
        "mem": {
          "index_type": { "r64": true },
          "element_type": { "r8": true }
        }
      }
    },
    "rexp": {
      "store": {
        "memory": {
          "var": {
            "name": "mem64",
            "id": 58,
            "typ": {
              "mem": {
                "index_type": { "r64": true },
                "element_type": { "r8": true }
              }
            }
          }
        },
        "address": {
          "var": { "name": "RSP", "id": 32, "typ": { "imm": 64 } }
        },
        "value": {
          "var": { "name": "t", "id": 107, "typ": { "imm": 64 } }
        },
        "endian": "little_endian",
        "size": { "r64": true }
      }
    }
  }
}

Then, for say, move, I could, in my interpreter specify some reasonable
action that captures those semantics, like allocate a 64 bit space in which
to store the value, and then also a SSA for the RSP variable value at such
a point. In this way, I could possibly specify other things such as
symbolic interpretation of specific memory regions for things like solving
to find certain constraints and limitations on code blocks. Then, after
some segments of code are lifted and interpreted, I provide some meaningful
context in terms of state, registers and memory, and the representation
gained could be executed upon in order to reach interesting path and state
combinations.

But I've never written a language before... I'm afraid I'm new. But I'm
very interested, and I want to learn so I'm looking to use infrastructure
that's already there, and learn how to construct this properly.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150409/a063b108/attachment.html>