[Lldb-commits] [lldb] r198038 - Adding a document that describes the architecture of data formatters. Suggestions and ideas for improvements most welcome

Enrico Granata egranata at apple.com
Wed Dec 25 23:21:42 PST 2013

Author: enrico
Date: Thu Dec 26 01:21:41 2013
New Revision: 198038

URL: http://llvm.org/viewvc/llvm-project?rev=198038&view=rev
Adding a document that describes the architecture of data formatters. Suggestions and ideas for improvements most welcome

    lldb/trunk/www/architecture/index.html   (with props)

Modified: lldb/trunk/include/lldb/DataFormatters/FormattersContainer.h
URL: http://llvm.org/viewvc/llvm-project/lldb/trunk/include/lldb/DataFormatters/FormattersContainer.h?rev=198038&r1=198037&r2=198038&view=diff
--- lldb/trunk/include/lldb/DataFormatters/FormattersContainer.h (original)
+++ lldb/trunk/include/lldb/DataFormatters/FormattersContainer.h Thu Dec 26 01:21:41 2013
@@ -470,6 +470,7 @@ protected:
         for (const FormattersMatchCandidate& candidate : candidates)
+            // FIXME: could we do the IsMatch() check first?
             if (Get(candidate.GetTypeName(),entry))
                 if (candidate.IsMatch(entry) == false)

Modified: lldb/trunk/scripts/Python/python-wrapper.swig
URL: http://llvm.org/viewvc/llvm-project/lldb/trunk/scripts/Python/python-wrapper.swig?rev=198038&r1=198037&r2=198038&view=diff
--- lldb/trunk/scripts/Python/python-wrapper.swig (original)
+++ lldb/trunk/scripts/Python/python-wrapper.swig Thu Dec 26 01:21:41 2013
@@ -354,9 +354,7 @@ LLDBSwigPythonCallTypeScript
                     *pyfunct_wrapper = pfunc_impl;
-        /*else
-            printf("caching works!!!!\n");*/
         PyCallable pfunc = PyCallable::FindWithPythonObject(pfunc_impl);
         if (!pfunc)

Added: lldb/trunk/www/architecture/index.html
URL: http://llvm.org/viewvc/llvm-project/lldb/trunk/www/architecture/index.html?rev=198038&view=auto
--- lldb/trunk/www/architecture/index.html (added)
+++ lldb/trunk/www/architecture/index.html Thu Dec 26 01:21:41 2013
@@ -0,0 +1,281 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml">
+<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
+<link href="../style.css" rel="stylesheet" type="text/css" />
+<title>LLDB Architecture</title>
+    <div class="www_title">
+      <strong>LLDB</strong>'s Architecture
+    </div>
+<div id="container">
+	<div id="content">
+  <!--#include virtual="../sidebar.incl"-->
+		<div id="middle">
+			<div class="post">
+				<h1 class ="postheader">Architecture</h1>
+				<div class="postcontent">
+				   <p>LLDB is a large and complex codebase. This section will help you become more familiar with
+				       the pieces that make up LLDB and give a general overview of the general architecture.</p>
+				</div>
+				<div class="postfooter"></div>
+			</div>
+			<div class="post">
+				<h1 class ="postheader">Code Layout</h1>
+				<div class="postcontent">
+				   <p>LLDB has many code groupings that makeup the source base:</p>
+                   <ul>
+     					<li><a href="#api">API</a></li>
+      					<li><a href="#breakpoint">Breakpoint</a></li>
+      					<li><a href="#commands">Commands</a></li>
+      					<li><a href="#core">Core</a></li>
+      					<li><a href="#dataformatters">DataFormatters</a></li>
+       					<li><a href="#expression">Expression</a></li>
+       					<li><a href="#host">Host</a></li>
+       					<li><a href="#interpreter">Interpreter</a></li>
+       					<li><a href="#symbol">Symbol</a></li>
+       					<li><a href="#targ">Target</a></li>
+       					<li><a href="#utility">Utility</a></li>
+   				    </ul>
+				</div>
+				<div class="postfooter"></div>
+			</div>
+			<a name="api"></a>
+			<div class="post">
+				<h1 class ="postheader">API</h1>
+				<div class="postcontent">
+				   <p>The API folder contains the public interface to LLDB.</p>
+                   <p>We are currently vending a C++ API. In order to be able to add
+  				        methods to this API and allow people to link to our classes,
+  				        we have certain rules that we must follow:</p>
+                   <ul>
+     					<li>Classes can't inherit from any other classes.</li>
+      					<li>Classes can't contain virtual methods.</li>
+       					<li>Classes should be compatible with script bridging utilities like <a href="http://www.swig.org/">swig</a>.</li>
+       					<li>Classes should be lightweight and be backed by a single member. Pointers (or shared pointers) are the preferred choice since they allow changing the contents of the backend without affecting the public object layout.</li>
+       					<li>The interface should be as minimal as possible in order to give a complete API.</li>
+   				    </ul>
+   				    <p>By adhering to these rules we should be able to continue to 
+   				        vend a C++ API, and make changes to the API as any additional
+   				        methods added to these classes will just be a dynamic loader
+   				        lookup and they won't affect the class layout (since they
+   				        aren't virtual methods, and no members can be added to the
+   				        class).</p>
+				</div>
+				<div class="postfooter"></div>
+			</div>
+			<a name="breakpoint"></a>
+			<div class="post">
+				<h1 class ="postheader">Breakpoint</h1>
+				<div class="postcontent">
+				   <p>A collection of classes that implement our breakpoint classes. 
+				       Breakpoints are resolved symbolically and always continue to
+				       resolve themselves as your program runs. Whether settings breakpoints
+				       by file and line, by symbol name, by symbol regular expression,
+				       or by address, breakpoints will keep trying to resolve new locations
+				       each time shared libraries are loaded. Breakpoints will of course
+				       unresolve themselves when shared libraries are unloaded. Breakpoints
+				       can also be scoped to be set only in a specific shared library. By
+				       default, breakpoints can be set in any shared library and will continue
+				       to attempt to be resolved with each shared library load.</p>
+                   <p>Breakpoint options can be set on the breakpoint,
+                       or on the individual locations. This allows flexibility when dealing
+                       with breakpoints and allows us to do what the user wants.</p>
+				</div>
+				<div class="postfooter"></div>
+			</div>
+			<a name="commands"></a>
+			<div class="post">
+				<h1 class ="postheader">Commands</h1>
+				<div class="postcontent">
+				   <p>The command source files represent objects that implement
+				       the functionality for all textual commands available 
+				       in our command line interface.</p>
+                   <p>Every command is backed by a <b>lldb_private::CommandObject</b>
+                       or <b>lldb_private::CommandObjectMultiword</b> object.</p>
+                   <p><b>lldb_private::CommandObjectMultiword</b> are commands that
+                      have subcommands and allow command line commands to be
+                      logically grouped into a hierarchy.</p>
+                  <p><b>lldb_private::CommandObject</b> command line commands
+                      are the objects that implement the functionality of the
+                      command. They can optionally define
+                     options for themselves, as well as group those options into
+                     logical groups that can go together. The help system is
+                     tied into these objects and can extract the syntax and
+                     option groupings to display appropriate help for each
+                     command.</p>
+				</div>
+				<div class="postfooter"></div>
+			</div>
+			<a name="core"></a>
+			<div class="post">
+				<h1 class ="postheader">Core</h1>
+				<div class="postcontent">
+				   <p>The Core source files contain basic functionality that
+				       is required in the debugger. A wide variety of classes
+				       are implemented:</p>
+                       <ul>
+         					<li>Address (section offset addressing)</li>
+          					<li>AddressRange</li>
+           					<li>Architecture specification</li>
+           					<li>Broadcaster / Event / Listener </li>
+           					<li>Communication classes that use Connection objects</li>
+           					<li>Uniqued C strings</li>
+           					<li>Data extraction</li>
+           					<li>File specifications</li>
+           					<li>Mangled names</li>
+           					<li>Regular expressions</li>
+           					<li>Source manager</li>
+           					<li>Streams</li>
+           					<li>Value objects</li>
+       				    </ul>
+				</div>
+				<div class="postfooter"></div>
+			</div>
+			<a name="dataformatters"></a>
+			<div class="post">
+				<h1 class ="postheader">DataFormatters</h1>
+				<div class="postcontent">
+				   <p>A collection of classes that implement the data formatters subsystem.</p>
+				   <p>Data formatters provide a set of user-tweakable hooks in the ValueObjects world that allow 
+					to customize presentation aspects of variables. While users interact with formatters mostly through the
+					<code>type</code> command, inside LLDB there are a few layers to the implementation: DataVisualization at the highest
+					end of the spectrum, backed by classes implementing individual formatters, matching rules, ...</p>
+				<p>For a general user-level introduction to data formatters, you can look <a href="../varformats.html">here</a>.
+				<p>More details on the architecture are to be found <a href="../architecture/varformats.html">here</a>.
+				</div>
+				<div class="postfooter"></div>
+			</div>
+			<a name="expression"></a>
+			<div class="post">
+				<h1 class ="postheader">Expression</h1>
+				<div class="postcontent">
+				   <p>Expression parsing files cover everything from evaluating
+				       DWARF expressions, to evaluating expressions using
+				       Clang.</p>
+				   <p>The DWARF expression parser has been heavily modified to
+				       support type promotion, new opcodes needed for evaluating
+				       expressions with symbolic variable references (expression local variables,
+				       program variables), and other operators required by
+				       typical expressions such as assign, address of, float/double/long 
+				       double floating point values, casting, and more. The
+				       DWARF expression parser uses a stack of lldb_private::Value
+				       objects. These objects know how to do the standard C type
+				       promotion, and allow for symbolic references to variables
+				       in the program and in the LLDB process (expression local
+				       and expression global variables).</p>
+				    <p>The expression parser uses a full instance of the Clang
+				        compiler in order to accurately evaluate expressions.
+				        Hooks have been put into Clang so that the compiler knows
+				        to ask about identifiers it doesn't know about. Once
+				        expressions have be compiled into an AST, we can then
+				        traverse this AST and either generate a DWARF expression
+				        that contains simple opcodes that can be quickly re-evaluated
+				        each time an expression needs to be evaluated, or JIT'ed
+				        up into code that can be run on the process being debugged.</p>
+				</div>
+				<div class="postfooter"></div>
+			</div>
+			<a name="host"></a>
+			<div class="post">
+				<h1 class ="postheader">Host</h1>
+				<div class="postcontent">
+				   <p>LLDB tries to abstract itself from the host upon which
+				       it is currently running by providing a host abstraction
+				       layer.  This layer involves everything from spawning, detaching,
+				       joining and killing native in-process threads, to getting
+				       current information about the current host.</p>
+    				   <p>Host functionality includes abstraction layers for:</p>
+                           <ul>
+             					<li>Mutexes</li>
+              					<li>Conditions</li>
+               					<li>Timing functions</li>
+               					<li>Thread functions</li>
+               					<li>Host target triple</li>
+               					<li>Host child process notifications</li>
+               					<li>Host specific types</li>
+           				    </ul>
+				</div>
+				<div class="postfooter"></div>
+			</div>
+			<a name="interpreter"></a>
+			<div class="post">
+				<h1 class ="postheader">Interpreter</h1>
+				<div class="postcontent">
+				   <p>The interpreter classes are the classes responsible for
+				       being the base classes needed for each command object,
+				       and is responsible for tracking and running command line
+				       commands.</p>
+				</div>
+				<div class="postfooter"></div>
+			</div>
+			<a name="symbol"></a>
+			<div class="post">
+				<h1 class ="postheader">Symbol</h1>
+				<div class="postcontent">
+				   <p>Symbol classes involve everything needed in order to parse
+				       object files and debug symbols. All the needed classes
+				       for compilation units (code and debug info for a source file),
+				       functions, lexical blocks within functions, inlined
+				       functions, types, declaration locations, and variables
+				       are in this section.</p>
+				</div>
+				<div class="postfooter"></div>
+			</div>
+			<a name="targ"></a>
+			<div class="post">
+				<h1 class ="postheader">Target</h1>
+				<div class="postcontent">
+				   <p>Classes that are related to a debug target include:</p>
+                       <ul>
+       					   <li>Target</li>
+        					<li>Process</li>
+         					<li>Thread</li>
+          					<li>Stack frames</li>
+          					<li>Stack frame registers</li>
+           					<li>ABI for function calling in process being debugged</li>
+           					<li>Execution context batons</li>
+       				    </ul>
+				</div>
+				<div class="postfooter"></div>
+			</div>
+			<a name="utility"></a>
+			<div class="post">
+				<h1 class ="postheader">Utility</h1>
+				<div class="postcontent">
+				   <p>Utility files should be as stand alone as possible and
+				       available for LLDB, plug-ins or related 
+				       applications to use.</p>
+    				   <p>Files found in the Utility section include:</p>
+                           <ul>
+           					   <li>Pseudo-terminal support</li>
+            					<li>Register numbering for specific architectures.</li>
+             					<li>String data extractors</li>
+           				    </ul>
+				</div>
+				<div class="postfooter"></div>
+			</div>
+		</div>
+	</div>

Propchange: lldb/trunk/www/architecture/index.html
    svn:executable = *

Added: lldb/trunk/www/architecture/varformats.html
URL: http://llvm.org/viewvc/llvm-project/lldb/trunk/www/architecture/varformats.html?rev=198038&view=auto
--- lldb/trunk/www/architecture/varformats.html (added)
+++ lldb/trunk/www/architecture/varformats.html Thu Dec 26 01:21:41 2013
@@ -0,0 +1,324 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml">
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<link href="../style.css" rel="stylesheet" type="text/css" />
+<title>LLDB Homepage</title>
+    <div class="www_title">
+      <strong>LLDB</strong> Data Formatters Architecture
+    </div>
+<div id="container">
+	<div id="content">
+        <!--#include virtual="../sidebar.incl"-->
+		<div id="middle">
+			<div class="post">
+				<h1 class ="postheader">Bird's eye view</h1>
+				<div class="postcontent">
+					<p>The LLDB data formatters subsystem is used to allow the debugger as well as the end-users to customize the way
+						their variables look upon inspection in the user interface (be it the command line tool, or one of the several
+						GUIs that are backed by LLDB)
+					<p>To this aim, they are hooked into the ValueObjects model, in order to provide entry points through which such customization
+						questions can be answered as <i>what format should this number be printed as?</i>, <i>how many child elements does this
+							std::vector have?</i> and more along those lines
+					<p>The architecture of the subsystem is layered, with the highest level layer being the user visible interaction features
+						(e.g. the "type ***" commands, the SB classes, ...). Other layers of interest that will be analyzed in this document include
+						<ul>
+							<li>Classes implementing individual data formatter types</li>
+							<li>Classes implementing formatters navigation, discovery and categorization</li>
+							<li>The FormatManager layer</li>
+							<li>The DataVisualization layer</li>
+							<li>The SWIG LLDB <---> communication layer</li>
+						</ul>
+                </div>
+    			<div class="postfooter"></div>
+			</div>
+			<div class="post">
+				<h1 class ="postheader">Data formatter types</h1>
+				<div class="postcontent">
+					<p> As described in the user documentation, there are four types of formatters
+						<ul>
+							<li>formats</li>
+							<li>summaries</li>
+							<li>filters</li>
+							<li>synthetic children</li>
+						</ul>
+					<p>Architecturally, these are implemented by classes in the source/DataFormatters/ folder<br/>
+						Formatters have descriptor classes, Type*Impl, which contain at least a "Flags" nested object, which contains both rules to be used
+						by the matching algorithm (e.g. should the formatter for type Foo apply to a Foo*?) or rules to be used
+						by the formatter itself (e.g. is this summary a oneliner?)
+					<p>Individual formatter descriptor classes then also contain data items useful to them for performing their functionality.
+						For instance TypeFormatImpl (backing formats) contains an lldb::Format that is the format to then be applied
+						were this formatter to be selected. Upon issuing a "type format add", a new TypeFormatImpl is created that wraps
+						the user-specified format, and matching options:<br/><br/>
+						<code>entry.reset(new TypeFormatImpl(format,
+				                                    TypeFormatImpl::Flags().SetCascades(m_command_options.m_cascade).
+				                                    SetSkipPointers(m_command_options.m_skip_pointers).
+				                                    SetSkipReferences(m_command_options.m_skip_references)));</code><br/><br/>
+					<p>While formats are fairly simple and only implemented by one class, the other formatter types are backed by a class hierarchy
+					<p>Summaries, for instance, can exist in one of three "flavors":
+						<ul>
+							<li>summary strings</li>
+							<li>Python script</li>
+							<li>native C++</li>
+						</ul>
+					<p>The base class for summaries, TypeSummaryImpl, is a pure virtual class that wraps, again, the Flags, and exports among others a
+						<br/><br/><code>
+							virtual bool
+					        FormatObject (ValueObject *valobj,
+					                      std::string& dest) = 0;
+					        </code><br/><br/>
+					<p>This is the core entry point, which allows subclasses to specify their mode of operation
+					<p>StringSummaryFormat, which is the class that implements summary strings, does a check as to whether
+						the summary is a one-liner, and if not, then uses its stored summary string to call into
+						Debugger::FormatPrompt, and obtain a string back, which it returns in dest as the resulting summary
+					<p>For a Python summary, implemented in ScriptSummaryFormat, FormatObject() calls into the ScriptInterpreter
+						which is supposed to hold the knowledge on how to bridge back and forth with the scripting language
+						(Python in the case of LLDB) in order to produce a valid string. Implementors of new ScriptInterpreters for other
+						languages are expected to provide a GetScriptedSummary() entrypoint for this purpose, if they desire to allow
+						users to provide formatters in the new language
+					<p> Lastly, C++ summaries (CXXFunctionSummaryFormat), wrap a function pointer and call into it to execute their duty.
+						It should be noted that there are no facilities for users to interact with C++ formatters, and as such they are extremely
+						opaque, effectively being a thin wrapper between plain function pointers and the LLDB formatters subsystem.<br/>
+						Also, dynamic loading of C++ formatters in LLDB is currently not implemented, and as such it is safe and reasonable
+						for these formatters to deal with internal ValueObjects instances instead of public SBValue objects
+					<p>An interesting data point is that summaries are expected to be stateless. While at the Python layer they are handed
+						an SBValue (since nothing else could be visible for scripts), it is not expected that the SBValue should be cached
+						and reused - any and all caching occurs on the LLDB side, completely transparent to the formatter itself<br/><br/><br/>
+					<p>The design of synthetic children is somewhat more intricate, due to them being stateful objects.<br/>
+						The core idea of the design is that synthetic children act like a two-tier model, in which there is a <i>backend</i>
+						dataset (the underlying unformatted ValueObject), and an higher level view (<i>frontend</i>) which vends the computed
+						representation
+					<p>To implement a new type of synthetic children one would implement a subclass of SyntheticChildren, which akin to the TypeFormatImpl,
+						contains Flags for matching, and data items to be used for formatting. For instance, TypeFilterImpl (which implements filters),
+						stores the list of expression paths of the children to be displayed. <br/>Filters are themselves synthetic children. Since all they
+						do is provide child values for a ValueObject, it does not truly matter whether these come from the real set of children or are
+						crafted through some intricate algorithm. As such, they perfectly fit within the realm of synthetic children and are only
+						shown as separate entities for user friendliness (to a user, picking a subset of elements to be shown with relative ease is a
+						valuable task, and they should not be concerned with writing scripts to do so)
+					<p>Once the descriptor of the synthetic children has been coded, in order to hook it up, one has to implement a subclass of
+						SyntheticChildrenFrontEnd. For a given type of synthetic children, there is a deep coupling with the matching front-end class,
+						given that the front-end usually needs data stored in the descriptor (e.g. a filter needs the list of child elements)
+					<p>The front-end answers the interesting questions that are the true <i>raison d'être</i> of synthetic children:
+						<br/>
+						<code>
+							<ul>
+								<li>
+							        virtual size_t
+							        CalculateNumChildren () = 0;
+								</li>
+								<li>
+									virtual lldb::ValueObjectSP
+							        GetChildAtIndex (size_t idx) = 0;
+								</li>
+								<li>
+									virtual size_t
+							        GetIndexOfChildWithName (const ConstString &name) = 0;
+								</li>
+								<li>
+									virtual bool
+							        Update () = 0;
+								</li>
+								<li>
+									virtual bool
+							        MightHaveChildren () = 0;
+								</li>
+							</ul>
+						</code><br/>
+					<p> Synthetic children providers (their front-ends) will be queried by LLDB for a number of children, and then for each of them
+						as necessary, they should be prepared to return a ValueObject describing the child. They might also be asked to provide a 
+						name-to-index mapping (e.g. to allow LLDB to resolve queries like <code>myFoo.myChild</code>)<br/>
+						Update() and MightHaveChildren() are described in the user documentation, and they mostly serve bookkeeping purposes
+					<p>LLDB provides three kinds of synthetic children: filters, scripted synthetics, and the native C++ providers<br/>
+						Filters are implemented by TypeFilterImpl/TypeFilterImpl::FrontEnd<br/><br/>
+						Scripted synthetics are implemented by ScriptedSyntheticChildren/ScriptedSyntheticChildren::FrontEnd, plus
+						a set of callbacks provided by the ScriptInterpteter infrastructure to allow LLDB to pass the front-end queries
+						down to the scripting languages<br/><br/>
+						As for C++ native synthetics, there is a CXXSyntheticChildren, but no corresponding FrontEnd class. The reason for this design is
+						that CXXSyntheticChildren store a callback to a creator function, which is responsible for providing a FrontEnd.
+						Each individual formatter (e.g. LibstdcppMapIteratorSyntheticFrontEnd, NSDictionaryMSyntheticFrontEnd, ...) is a standalone
+						frontend, and once created retains to relation to its underlying SyntheticChildren object
+					<p>On a ValueObject level, upon being asked to generate synthetic children for a ValueObject, LLDB spawns a ValueObjectSynthetic object
+						which is a subclass of ValueObject. Building upon the ValueObject infrastructure, it stores a backend, and a shared pointer to
+						the SyntheticChildren. <br/>
+						Upon being asked queries about children, it will use the SyntheticChildren to generate a front-end for itself
+						and will let the front-end answer questions. The reason for not storing the FrontEnd itself is that there is no guarantee that across
+						updates, the same FrontEnd will be used over and over (e.g. a SyntheticChildren object could serve an entire class hierarchy
+						and vend different frontends for different subclasses)
+                </div>
+    			<div class="postfooter"></div>
+			</div>
+			<div class="post">
+				<h1 class ="postheader">Formatters matching</h1>
+				<div class="postcontent">
+					<p>The problem of formatters matching is going from
+						"I have a ValueObject" to "these are the formatters to be used for it"<br/>
+						There is a rather intricate set of user rules that are involved, and a rather intricate implementation of this model. All of these
+						relate to the type of the ValueObject. It is assumed that types are a strong enough contract that it is possible to format an object
+						entirely depending on its type. If this turns out to not be correct, then the existing model will have to be changed fairly deeply.
+					<p>The basic building block is that formatters can match by exact type name or by regular expressions, i.e. one can describe matching
+						by saying things like "this formatters matches type __NSDictionaryI", or "this formatter matches all type names like ^std::__1::vector<.+>(( )?&)?$"<br/>This match happens in class FormattersContainer. For exact matches, this goes straight to the FormatMap
+						(the actual storage area for formatters), whereas for regular expression matches the regular expression is matched against the
+						provided candidate type name. If one were to introduce a new type of matching (say, match against number of $ signs present
+						in the typename, FormattersContainer is the place where such a change would have to be introduced).<br/>It should be noted that this
+						code involves template specialization, and as such is somewhat trickier than other formatters code to update.
+					<p>On top of the string matching mechanism (exact or regex), there are a set of more advanced rules implemented 
+						by the FormattersContainer,
+						with the aid of the FormattersMatchCandidate. Namely, it is assumed that any formatter class will have flags to say whether
+						it allows <i>cascading</i> (i.e. seeing through typedefs), allowing pointers-to-object and reference-to-object to be formatted.
+						<br/>Upon verifying that a formatter would be a textual match, the Flags are checked, and if they do not allow the formatter
+						to be used (e.g. pointers are not allowed, and one is looking at a Foo*), then the formatter is rejected and the search continues.
+						If the flags also match, then the formatter is returned upstream and the search is over.
+					<p>One relevant fact to notice is that this entire mechanism is not dependent on the kind of formatter to be returned, which makes it
+						easier to devise new types of formatters as the lowest layers of the system. The demands on individual formatters are that they
+						define a few typedefs, and export a Flags object, and then they can be freely matched against types as needed.
+					<p>This mechanism is replicated across a number of <i>categories</i>. A category is a named bucket where formatters are grouped on
+						some basis. The most common reason for a category to exist is a library (e.g. libcxx formatters vs. libstdcpp formatters).
+						<br/>
+						Categories can be enabled or disabled, and they have a priority number, called position. The priority sets a strong order among
+						enabled categories. A category named "default" is always the highest priority one and it's the category where all formatters that
+						do not ask for a category of their own end up (e.g. "type summary add ...." without a "-w somecategory" flag passed)<br/>
+						The algorithm inquires each category, in the order of their priorities, for a formatter for a type, and upon receiving a positive
+						answer from a category, ends the search. Of course, no search occurs in disabled categories.
+					<p>At the individual category level, there is the first dependence on the type of formatter to be returned. Since both filters and
+						synthetic children proper are implemented through the same backing store, the matching code needs to ensure that, were both a
+						synthetic children provider and a filter to match a type, only the most recently added one is actually used.
+						<br/>The details of the algorithm used are to be found in TypeCategoryImpl::Get().<br/>
+					<p>It is quite obvious, even to a casual reader, that there are a number of complexities involved in this algorithm.<br/>
+						For starters, the entire search process has to be repeated for every variable.<br/>
+						Moreover, for each category, one has to repeat the entire process of crawling the types (go to pointee, ...).<br/>
+						This is exactly the algorithm initially implemented by LLDB. Over the course of the life of the formatters subsystem,
+						two main evolutions have been made to the matching mechanism:
+						<ul>
+							<li>A caching mechanism</li>
+							<li>A pregeneration of all possible type matches</li>
+						</ul>
+					<p>The cache is a layer that sits between the FormatManager and the TypeCategoryMap. Upon being asked to figure out a formatter,
+						the FormatManager will first query the cache layer, and only if that fails, will the categories be queried using the full
+						search algorithm. The result of that full search will then be stored in the cache. Even a negative answer (no formatter)
+						gets stored. The negative answer is actually the most beneficial to cache as obtaining it requires traversing all possible
+						formatters in all categories just to get a no-op back.<br/>
+						Of course, once an answer is cached, getting it will be much quicker than going to a full category search, as the cached
+						answers are of the form "type foo" --> "formatter bar". But given how formatters can be edited or removed by the user,
+						either at the command line or via the API, there needs to be a way to invalidate the cache.<br/>
+						This happens through the FormatManager::Changed() method. In general, anything that changes the formatters causes
+						FormatManager::Changed() to be called through the IFormatChangeListener interface. This call increases the
+						FormatManager's revision and clears the cache. The revision number is a monotonically increasing integer counter
+						that essentially corresponds to the number of changes made to the formatters throughout the current LLDB session.
+						This counter is used by ValueObjects to know when their formatters are out of date. Since a search is a potentially
+						expensive operation, before caching was introduced, individual ValueObjects remembered which revision of the FormatManager
+						they used to search for their formatter, and stored it, so that they would not repeat the search unless a change in the
+						formatters had occurred. While caching has made this less critical of an optimization, it is still sensible and thus is kept.
+						<br/>Lastly, as a side note, it is worth highlighting that <strong>any</strong> change in the formatters invalidates the
+						<strong>entire</strong> cache. It would likely not be impossible to be smarter and figure out a subset of cache entries
+						to be deleted, letting others persist, instead of having to rebuild the entire cache from scratch. However, given that formatters
+						are not that frequently changed during a debug session, and the algorithmic complexity to "get it right" seems larger than the
+						potential benefit to be had from doing it, the full cache invalidation is the chosen policy. The algorithm to selectively invalidate
+						entries is probably one of the major areas for improvements in formatters performance.
+					<p>The second major optimization, introduced fairly recently, is the pregeneration of type matches. The original algorithm was based upon
+						the notion of a FormatNavigator as a smart object, aware of all the intricacies of the matching rules. For each category, the
+						FormatNavigator would generate the possible matches (e.g. dynamic type, pointee type, ...), and check each one, one at a time.
+						If that failed for a category, the next one would again generate the same matches.<br/>
+						This worked well, but was of course inefficient. The FormattersMatchCandidate is the solution to this performance issue.
+						In top-of-tree LLDB, the FormatManager has the centralized notion of the matching rules, and the former FormatNavigators are now
+						FormattersContainers, whose only job is to guarantee a centralized storage of formatters, and thread-safe access to such storage.
+						<br/>FormatManager::GetPossibleMatches() fills a vector of possible matches. The way it works is by applying each rule,
+						generating the corresponding typename, and storing the typename, plus the required Flags for that rule to be accepted
+						as a match candidate (e.g. if the match comes by fetching the pointee type, a formatter that matches will have to allow pointees
+						as part of its Flags object). The TypeCategoryMap, when tasked with finding a formatter for a type, generates all possible matches
+						and passes them down to each category. In this model, the type system only does its (expensive) job once, and textual or regex
+						matches are the core of the work.
+                </div>
+    			<div class="postfooter"></div>
+			</div>
+			<div class="post">
+				<h1 class ="postheader">FormatManager and DataVisualization</h1>
+				<div class="postcontent">
+					<p>There are two main entry points in the data formatters: the FormatManager and the DataVisualization<br/>
+						The FormatManager is the <i>internal</i> such entry point. In this context, internal refers to data formatters code
+						itself, compared to other parts of LLDB. For other components of the debugger, the DataVisualization provides a more
+						stable entry point. On the other hand, the FormatManager is an aggregator of all moving parts, and as such is less stable
+						in the face of refactoring.<br/>People involved in the data formatters code itself, however, will most likely have to confront
+						the FormatManager for significant architecture changes.
+					<p>The FormatManager wraps a TypeCategoryMap (the list of all existing categories, enabled and not), the FormatCache, and several
+						utility objects. Plus, it is the repository of named summaries, since these don't logically belong anywhere else.<br/>
+						It is also responsible for creating all builtin formatters upon the launch of LLDB. It does so through a bunch
+						of methods Load***Formatters(), invoked as part of its constructor. The original design of data formatters anticipated
+						that individual libraries would load their formatters as part of their debug information. This work however has largely been
+						left unattended in practice, and as such core system libraries (mostly those for OSX/iOS development as of today) load their
+						formatters in an hardcoded fashion.
+					<p>For performance reasons, the FormatManager is constructed upon being first required.
+						This happens through the DataVisualization layer. Upon first being inquired for anything formatters, DataVisualization
+						calls its own local static function GetFormatManager(), which in turns constructs and returns a local static FormatManager.<br/>
+						Unlike most things in LLDB, the lifetime of the FormatManager is the same as the entire session, rather than a specific Debugger
+						or Target instance. This is an area to be improved, but as of now it has not caused enough grief to warrant action. If this work
+						were to be undertaken, one could conceivably devise a per-architecture-triple model, upon the assumption that an OS and CPU
+						combination are a good enough key to decide which formatters apply (e.g. Linux i386 is probably different from OSX x86_64, but two
+						OSX x86_64 targets will probably have the same formatters; of course versioning of the underlying OS is also to be considered,
+						but experience with OSX has shown that formatters can take care of that internally in most cases of interest).
+					<p>The public entry point is the DataVisualization layer. DataVisualization is a static class on which questions can be asked
+						in a relatively refactoring-safe manner.
+						<br/>The main question asked of it is to obtain formatters for ValueObjects (or typenames).
+						One can also query DataVisualization for named summaries or individual categories, but of course those queries delve deeper
+						in the internal object model.<br/>As said, the FormatManager holds a notion of revision number, which changes every time
+						formatters are edited (added, deleted, categories enabled or disabled, ...). Through DataVisualization::ForceUpdate() one
+						can cause the same effects of a formatters edit to happen without it actually having happened.<br/>
+						The main reason for this feature is that formatters can be dynamically created in Python, and one can then enter the
+						ScriptInterpreter and edit the formatter function or class. If formatters were not updated, one could find them to be out of sync
+						with the new definitions of these objects. To avoid the issue, whenever the user exits the scripting mode, formatters force
+						an update to make sure new potential definitions are reloaded on demand.
+                </div>
+    			<div class="postfooter"></div>
+			</div>
+			<div class="post">
+				<h1 class ="postheader">The SWIG layer</h1>
+				<div class="postcontent">
+					<p>In order to implement formatters written in Python, LLDB requires that ScriptInterpreter implementations provide a set
+						of functions that one can call to ask formatting questions of scripts.<br/>
+						For instance, in order to obtain a scripting summary, LLDB calls 
+						<code><br/>
+							virtual bool<br/>
+						    GetScriptedSummary (const char *function_name,<br/>
+						                        llldb::ValueObjectSP valobj,<br/>
+						                        lldb::ScriptInterpreterObjectSP& callee_wrapper_sp,<br/>
+						                        std::string& retval)<br/>
+						    </code><br/>
+					<p>For Python, this function is implemented by first checking if the callee_wrapper_sp is valid.
+						If so, LLDB knows that it does not need to search a function with the passed name, and can directly
+						call the wrapped Python function object. Either way, the call is routed to a global callback <code>g_swig_typescript_callback</code>
+					<p>This callback pointer points to <code>LLDBSwigPythonCallTypeScript</code>, defined in python-wrapper.swig<br/>
+						The details of the implementation require familiarity with the Python C API, plus a few utility objects defined
+						by LLDB to ease the burden of dealing with the scripting world. However, as a sketch of what happens, the code
+						tries to find a Python function object with the given name (i.e. if you say "type summary add -F module.function", LLDB will scan
+						for "module" module, and then for a function named "function" inside the module's namespace). If the function object is found,
+						it is wrapped in a PyCallable, which is an LLDB utility class that wraps the callable and allows for easier calling.
+						The callable gets invoked, and the return value, if any, is cast into a string. Originally, if a non-string object was returned,
+						LLDB would refuse to use it. This disallowed such simple construct as
+						<code><br/>def getSummary(value,*args):<br/>   return 1</br></code> from working
+					<p>Similar considerations apply to other formatter (and non-formatter related) scripting callbacks
+                </div>
+    			<div class="postfooter"></div>
+			</div>
+			<div class="post">
+				<h1 class ="postheader">Conclusion</h1>
+				<div class="postcontent">
+					<p>This document is an introduction to the design of the LLDB data formatters subsystem<br/>
+						The intended target audience are people interested in understanding or modifying the formatters themselves
+						rather than writing a specific data formatter. For this latter purpose, the user documentation about formatters
+						is the main relevant document which one should refer to.
+					<p>On the other hand, this one page highlights some open areas for improvement to the general subsystem, and more evolutions
+						not anticipated here are certainly possible. As usual, the lldb-dev mailing list is the point of first contact for
+						discussing desired new features or changes of existing features.
+                </div>
+    			<div class="postfooter"></div>
+			</div>
+		</div>
+	</div>

Modified: lldb/trunk/www/sidebar.incl
URL: http://llvm.org/viewvc/llvm-project/lldb/trunk/www/sidebar.incl?rev=198038&r1=198037&r2=198038&view=diff
--- lldb/trunk/www/sidebar.incl (original)
+++ lldb/trunk/www/sidebar.incl Thu Dec 26 01:21:41 2013
@@ -23,7 +23,7 @@
         <li><a href="scripting.html">Python Example</a></li>
         <li><a href="symbols.html">Symbols on Mac OS X</a></li>
         <li><a href="troubleshooting.html">Troubleshooting</a></li>
-        <li><a href="architecture.html">Architecture</a></li>
+        <li><a href="architecture/index.html">Architecture</a></li>

More information about the lldb-commits mailing list