[LLVMdev] BNF for IL/IR interpreter

Thu Apr 9 12:10:09 PDT 2015

I apologize, I found Yacc and Bison. I have my reading cut out for me.

On Thu, Apr 9, 2015 at 2:37 PM, Kenneth Adam Miller <
kennethadammiller at gmail.com> wrote:

> This might be a very beginner question, but I'm looking for an example for
> something that I have never done.
>
> Suppose that I wanted to express actions with respect to lifted semantics
> of CPU instructions to an intermediate representation, BAP IL or LLVM IR.
> How might I go about providing a Backus Naur Form specification and then
> dynamically interpreting those lifted instructions by also specifying
> actions to be done with any given IL/IR primitive? I'm looking for any
> library that allows me to express BNF terms and actions on them.
>
> Like, say I convert push ebp to Bap IL (here's a json representation from
> their live development branch):
>
> {
>   "move": {
>     "lvar": { "name": "t", "id": 107, "typ": { "imm": 64 } },
>     "rexp": { "var": { "name": "RBP", "id": 30, "typ": { "imm": 64 } } }
>   }
> },
> {
>   "move": {
>     "lvar": { "name": "RSP", "id": 32, "typ": { "imm": 64 } },
>     "rexp": {
>       "binop": {
>         "op": "minus",
>         "lexp": {
>           "var": { "name": "RSP", "id": 32, "typ": { "imm": 64 } }
>         },
>         "rexp": { "inte": { "int": "MHg4OjY0" } }
>       }
>     }
>   }
> },
> {
>   "move": {
>     "lvar": {
>       "name": "mem64",
>       "id": 58,
>       "typ": {
>         "mem": {
>           "index_type": { "r64": true },
>           "element_type": { "r8": true }
>         }
>       }
>     },
>     "rexp": {
>       "store": {
>         "memory": {
>           "var": {
>             "name": "mem64",
>             "id": 58,
>             "typ": {
>               "mem": {
>                 "index_type": { "r64": true },
>                 "element_type": { "r8": true }
>               }
>             }
>           }
>         },
>         "address": {
>           "var": { "name": "RSP", "id": 32, "typ": { "imm": 64 } }
>         },
>         "value": {
>           "var": { "name": "t", "id": 107, "typ": { "imm": 64 } }
>         },
>         "endian": "little_endian",
>         "size": { "r64": true }
>       }
>     }
>   }
> }
>
> Then, for say, move, I could, in my interpreter specify some reasonable
> action that captures those semantics, like allocate a 64 bit space in which
> to store the value, and then also a SSA for the RSP variable value at such
> a point. In this way, I could possibly specify other things such as
> symbolic interpretation of specific memory regions for things like solving
> to find certain constraints and limitations on code blocks. Then, after
> some segments of code are lifted and interpreted, I provide some meaningful
> context in terms of state, registers and memory, and the representation
> gained could be executed upon in order to reach interesting path and state
> combinations.
>
> But I've never written a language before... I'm afraid I'm new. But I'm
> very interested, and I want to learn so I'm looking to use infrastructure
> that's already there, and learn how to construct this properly.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150409/5a44d25c/attachment.html>