[lldb-dev] LLDB expression parser presentation
Greg Clayton
gclayton at apple.com
Wed Sep 4 10:02:14 PDT 2013
I believe it depends on what the result is and how complex the expression is. After parsing each expression, we run through the IR to determine if we can evaluate the IR without JIT'ing the code and running it on the target. Sometimes we also run other expressions first (like one to gather the Objective C runtime info), so just because we run something, doesn't always guarantee it was the expression you typed in. Run the expression again and see if we run code to test for this condition.
On Sep 4, 2013, at 9:44 AM, Abid, Hafiz <Hafiz_Abid at mentor.com> wrote:
> Hi Sean,
> Thanks for this nice write-up. I observed that when I evaluate an expression like ‘expr variable’, LLDB runs the target as doing an inferior function call and then run the interpreter too. I was wondering if this is expected behaviour or I am missing something.
>
> Thanks,
> Abid
>
> From: lldb-dev-bounces at cs.uiuc.edu [mailto:lldb-dev-bounces at cs.uiuc.edu] On Behalf Of Sean Callanan
> Sent: 17 August 2013 02:13
> To: lldb-dev at cs.uiuc.edu
> Subject: [lldb-dev] LLDB expression parser presentation
>
> This is the outline of a brief presentation I gave on the LLDB expression parser.
> I’ve included some “thorny issues;” if we resolve these, the expression parser will get a lot better.
> Please let me know if you have any questions.
>
> Class layout
> The master - ClangExpressionParser manages Clang and LLVM to compile a single expression
> Its minions:
> ClangExpression - a unit of parseable code
> ClangUserExpression - specialized for the case where we’re using the “expr” command
> ExpressionSourceCode - handles wrapping
> ClangASTSource - resolves external variables
> ClangExpressionDeclMap - specialized for the current frame (if stopped at a particular location in the program being debugged)
> IRForTarget - rewrites IR
> ASTResultSynthesizer - makes the result
> IRMemoryMap - manages memory that may or may be in the program being debugged, or may be simulated by LLDB
> IRExecutionUnit - specialized to be able to interact with the JIT
>
> Basic Expression Flow
> User enters the expression: (lldb) expr a + 2
> We wrap the expression: void expr(arg *) { a + 2; }
>
> We wrap differently based on expression context.
> If stopped in a C++ instance method, we wrap as $__lldb_class::$__lldb_expr(void *)
> If stopped in an Objective-C instance method, we wrap as an Objective-C category
> If stopped in regular C code, we wrap as $__lldb_expr(void*)
> But we always parse in Objective-C++ mode.
>
> Typical wrapped expression:
> #define … // custom definitions provided by LLDB or the user
> void
> $__lldb_class::$__lldb_expr // __lldb_class resolves to the type of *this in the current frame
> (void *$__lldb_arg)
> {
> // expression text goes here
> }
>
> We resolve externals: “a” => int &a;
>
> This happens via a question-and-answer process with the Clang compiler through the clang::ExternalASTSource interface
> FindExternalVisibleDeclsByName searches for “globals” (globals from the perspective of the expression; these may be locals in the current stack frame)
> FindExternalLexicalDecls searches a single struct for all entities of a particular type
> CompleteType ensures that a single struct has all of its contents
> (These are useful because we lazily complete structs, providing a forward declaration first and only filling it in when needed)
>
> clang::ASTImporter is responsible for transferring Decls from one ASTContext (e.g., the ASTContext for a DWARF file) to another (e.g., the AST context for an expression)
> Our ClangASTImporter manages many of these (“Minions"), because there are many separate DWARF files containing debug information.
> We need to be able to remember where things came from.
>
> We add the result: static int ret = a + 2;
>
> This happens at the Clang AST level
> We handle Lvalues and Rvalues differently.
> For Lvalues, we store a pointer to them: T *$__result_ptr = …
> For Rvalues, we store the value itself: static T $__result = … // static ensures the expression doesn’t try to use a register or something silly like that
> We also store persistent types at this stage, e.g. struct $my_foo { int a; int b; }
>
> We rewrite the IR: *(arg+0) = *(arg+8)+2
>
> The IR as emitted by Clang’s CodeGen expects all external variables to be in symbols
> This is inconvenient if they are e.g. in registers, since you can’t link against a register
> This is also inconvenient for expression re-use, for example as a breakpoint condition… we’d have to re-link each time
> Our solution is to indirect variables through a struct passed into the expression (void *$__lldb_arg)
>
> Materializer’s job is to put all variables that aren’t referred to by symbols into this struct
> It will create temporary storage as necessary (e.g., to hold a variable value that was in a register)
> After the expression runs, a Dematerializer takes down all temporary storage, and ensures that variables are updated to reflect the expression’s side effects
>
> The IRForTarget class does various cleanup to help RTDyldMemoryManager (ideally much of this shouldn’t be necessary)
> It resolves all external symbols to avoid forcing RTDyldMemoryManager to resolve symbols
> It creates a string and float literal pool so RTDyldMemoryManager doesn’t have to relocate the constant pool
> It strips off nasty Objective-C metadata so RTDyldMemoryManager doesn’t have to look at it
>
> We interpret or execute the result: (int)$0 = 6
>
> IRExecutionUnit contains a module and the (real or simulated) memory it uses
>
> IRInterpreter can interpret a module without ever running the underlying process
> It emulates IR instructions one by one
> It uses lldb_private::Scalar to hold intermediate values, which is kinda limiting (no vectors, no FP math)
> IRExecutionUnit simulates memory allocation etc. so we can do a lot of pointer magic
>
> If the IRInterpreter can’t run, the MCJIT produces machine code and LLDB runs it
> IRExecutionUnit vends a custom JITMemoryManager implementation
> It remembers memory allocations and where functions were placed
> After JIT, all sections are placed into the target and we report their new locations with mapSectionAddress
>
> Selected Thorny Issues (concentrating on JIT-related issues)
> Make the MCJIT more robust so we can rely on it more
> Support all Mach-O and ELF relocation types
> Don’t assume resolved symbols are in the current process
> Don’t assume addresses fit into void*s
> Make the IRInterpreter support all data types and instructions
> Completely replace the LLVM interpreter!
>
> Sean
>
> _______________________________________________
> lldb-dev mailing list
> lldb-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
More information about the lldb-dev
mailing list