[Lldb-commits] [lldb] [lldb] Revive internal data formatter documentation (PR #66527)

Walter Erquinigo via lldb-commits lldb-commits at lists.llvm.org
Fri Sep 15 10:25:27 PDT 2023


================
@@ -0,0 +1,436 @@
+Data Formatters
+===============
+
+This page is an introduction to the design of the LLDB data formatters
+subsystem. The intended target audience are people interested in understanding
+or modifying the formatters themselves rather than writing a specific data
+formatter. For this latter purpose, the user documentation about formatters is
+the main relevant document which one should refer to.
+
+This page also highlights some open areas for improvement to the general
+subsystem, and more evolutions not anticipated here are certainly possible.
+
+Overview
+--------
+
+The LLDB data formatters subsystem is used to allow the debugger as well as the
+end-users to customize the way their variables look upon inspection in the user
+interface (be it the command line tool, or one of the several GUIs that are
+backed by LLDB)
+
+To this aim, they are hooked into the ValueObjects model, in order to provide
+entry points through which such customization questions can be answered. For
+example what format should this number be printed as? How many child elements
+does this ``std::vector`` have?
+
+The architecture of the subsystem is layered, with the highest level layer
+being the user visible interaction features (e.g. the ``type ***`` commands,
+the SB classes, ...). Other layers of interest that will be analyzed in this
+document include
+
+* Classes implementing individual data formatter types
+* Classes implementing formatters navigation, discovery and categorization
+* The ``FormatManager`` layer
+* The ``DataVisualization`` layer
+* The SWIG <> LLDB communication layer
+
+Data Formatter Types
+--------------------
+
+As described in the user documentation, there are four types of formatters:
+
+* Formats
+* Summaries
+* Filters
+* Synthetic children
+
+Formatters have descriptor classes, ``Type*Impl``, which contain at least a
+"Flags" nested object, which contains both rules to be used by the matching
+algorithm (e.g. should the formatter for type Foo apply to a Foo*?) or rules to
+be used by the formatter itself (e.g. is this summary a oneliner?).
+
+Individual formatter descriptor classes then also contain data items useful to
+them for performing their functionality. For instance ``TypeFormatImpl``
+(backing formats) contains an ``lldb::Format`` that is the format to then be
+applied were this formatter to be selected. Upon issuing a ``type format add``
+a new ``TypeFormatImpl`` is created that wraps the user-specified format, and
+matching options:
+
+::
+
+  entry.reset(new TypeFormatImpl(
+      format, TypeFormatImpl::Flags()
+                  .SetCascades(m_command_options.m_cascade)
+                  .SetSkipPointers(m_command_options.m_skip_pointers)
+                  .SetSkipReferences(m_command_options.m_skip_references)));
+
+
+While formats are fairly simple and only implemented by one class, the other
+formatter types are backed by a class hierarchy.
+
+Summaries, for instance, can exist in one of three "flavors":
+
+* Summary strings
+* Python script
+* Native C++
+
+The base class for summaries, ``TypeSummaryImpl``, is a pure virtual class that
+wraps, again, the Flags, and exports among others:
+
+::
+
+  virtual bool FormatObject (ValueObject *valobj, std::string& dest) = 0;
+
+
+This is the core entry point, which allows subclasses to specify their mode of
+operation.
+
+``StringSummaryFormat``, which is the class that implements summary strings,
+does a check as to whether the summary is a one-liner, and if not, then uses
+its stored summary string to call into ``Debugger::FormatPrompt``, and obtain a
+string back, which it returns in ``dest`` as the resulting summary.
+
+For a Python summary, implemented in ``ScriptSummaryFormat``,
+``FormatObject()`` calls into the ``ScriptInterpreter`` which is supposed to
+hold the knowledge on how to bridge back and forth with the scripting language
+(Python in the case of LLDB) in order to produce a valid string. Implementors
+of new ``ScriptInterpreters`` for other languages are expected to provide a
+``GetScriptedSummary()`` entrypoint for this purpose, if they desire to allow
+users to provide formatters in the new language
+
+Lastly, C++ summaries (``CXXFunctionSummaryFormat``), wrap a function pointer
+and call into it to execute their duty. It should be noted that there are no
+facilities for users to interact with C++ formatters, and as such they are
+extremely opaque, effectively being a thin wrapper between plain function
+pointers and the LLDB formatters subsystem.
+
+Also, dynamic loading of C++ formatters in LLDB is currently not implemented,
+and as such it is safe and reasonable for these formatters to deal with
+internal ``ValueObjects`` instances instead of public ``SBValue`` objects.
+
+An interesting data point is that summaries are expected to be stateless. While
+at the Python layer they are handed an ``SBValue`` (since nothing else could be
+visible for scripts), it is not expected that the ``SBValue`` should be cached
+and reused - any and all caching occurs on the LLDB side, completely
+transparent to the formatter itself.
+
+The design of synthetic children is somewhat more intricate, due to them being
+stateful objects. The core idea of the design is that synthetic children act
+like a two-tier model, in which there is a backend dataset (the underlying
+unformatted ``ValueObject``), and an higher level view (frontend) which vends
+the computed representation
+
+To implement a new type of synthetic children one would implement a subclass of
+``SyntheticChildren``, which akin to the ``TypeFormatImpl``, contains Flags for
+matching, and data items to be used for formatting. For instance,
+``TypeFilterImpl`` (which implements filters), stores the list of expression
+paths of the children to be displayed.
+
+Filters are themselves synthetic children. Since all they do is provide child
+values for a ``ValueObject``, it does not truly matter whether these come from the
+real set of children or are crafted through some intricate algorithm. As such,
+they perfectly fit within the realm of synthetic children and are only shown as
+separate entities for user friendliness (to a user, picking a subset of
+elements to be shown with relative ease is a valuable task, and they should not
+be concerned with writing scripts to do so).
+
+Once the descriptor of the synthetic children has been coded, in order to hook
+it up, one has to implement a subclass of ``SyntheticChildrenFrontEnd``. For a
+given type of synthetic children, there is a deep coupling with the matching
+front-end class, given that the front-end usually needs data stored in the
+descriptor (e.g. a filter needs the list of child elements).
+
+The front-end answers the interesting questions that are the true raison d'ĂȘtre
+of synthetic children:
+
+::
+
+  virtual size_t CalculateNumChildren () = 0;
+  virtual lldb::ValueObjectSP GetChildAtIndex (size_t idx) = 0;
+  virtual size_t GetIndexOfChildWithName (const ConstString &name) = 0;
+  virtual bool Update () = 0;
+  virtual bool MightHaveChildren () = 0;
+
+Synthetic children providers (their front-ends) will be queried by LLDB for a
+number of children, and then for each of them as necessary, they should be
+prepared to return a ``ValueObject`` describing the child. They might also be
+asked to provide a name-to-index mapping (e.g. to allow LLDB to resolve queries
+like ``myFoo.myChild``).
+
+``Update()`` and ``MightHaveChildren()`` are described in the user
+documentation, and they mostly serve bookkeeping purposes.
+
+LLDB provides three kinds of synthetic children: filters, scripted synthetics,
+and the native C++ providers Filters are implemented by
+``TypeFilterImpl::FrontEnd``.
+
+Scripted synthetics are implemented by ``ScriptedSyntheticChildren::FrontEnd``,
+plus a set of callbacks provided by the ``ScriptInterpteter`` infrastructure to
+allow LLDB to pass the front-end queries down to the scripting languages.
+
+As for C++ native synthetics, there is a ``CXXSyntheticChildren``, but no
+corresponding ``FrontEnd`` class. The reason for this design is that
+``CXXSyntheticChildren`` store a callback to a creator function, which is
+responsible for providing a ``FrontEnd``. Each individual formatter (e.g.
+``LibstdcppMapIteratorSyntheticFrontEnd``) is a standalone frontend, and once
+created retains to relation to its underlying ``SyntheticChildren`` object
----------------
walter-erquinigo wrote:

```suggestion
created retains to relation to its underlying ``SyntheticChildren`` object.
```

https://github.com/llvm/llvm-project/pull/66527


More information about the lldb-commits mailing list