[Mlir-commits] [mlir] c20c196 - Add Python bindings guide.
Stella Laurenzo
llvmlistbot at llvm.org
Thu Jul 9 20:49:56 PDT 2020
Author: Stella Laurenzo
Date: 2020-07-09T20:49:39-07:00
New Revision: c20c1960c15adb3b897aeb1ab83b6fa4caab2505
URL: https://github.com/llvm/llvm-project/commit/c20c1960c15adb3b897aeb1ab83b6fa4caab2505
DIFF: https://github.com/llvm/llvm-project/commit/c20c1960c15adb3b897aeb1ab83b6fa4caab2505.diff
LOG: Add Python bindings guide.
Subscribers: mehdi_amini, rriddle, jpienaar, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, stephenneuendorffer, Joonsoo, grosul1, Kayjukh, jurahul, msifontes
Tags: #mlir
Differential Revision: https://reviews.llvm.org/D83527
diff --git a/mlir/docs/Bindings/Python.md b/mlir/docs/Bindings/Python.md
new file mode 100644
index 000000000000..8d9cee5e88ca
--- /dev/null
+++ b/mlir/docs/Bindings/Python.md
@@ -0,0 +1,328 @@
+# MLIR Python Bindings
+Current status: Under development and not enabled by default
+## Building
+### Pre-requisites
+* [`pybind11`](https://github.com/pybind/pybind11) must be installed and able to
+ be located by CMake.
+* A relatively recent Python3 installation
+### CMake variables
+ Enables building the Python bindings. Defaults to `OFF`.
+ Links the native extension against the Python runtime library, which is
+ optional on some platforms. While setting this to `OFF` can yield some greater
+ deployment flexibility, linking in this way allows the linker to report
+ compile time errors for unresolved symbols on all platforms, which makes for a
+ smoother development workflow. Defaults to `ON`.
+ Specifies the `python` executable used for the LLVM build, including for
+ determining header/link flags for the Python bindings. On systems with
+ multiple Python implementations, setting this explicitly to the preferred
+ `python3` executable is strongly recommended.
+## Design
+### Use cases
+There are likely two primary use cases for the MLIR python bindings:
+1. Support users who expect that an installed version of LLVM/MLIR will yield
+ the ability to `import mlir` and use the API in a pure way out of the box.
+2. Downstream integrations will likely want to include parts of the API in their
+ private namespace or specially built libraries, probably mixing it with other
+ python native bits.
+### Composable modules
+In order to support use case #2, the Python bindings are organized into
+composable modules that downstream integrators can include and re-export into
+their own namespace if desired. This forces several design points:
+* Separate the construction/populating of a `py::module` from `PYBIND11_MODULE`
+ global constructor.
+* Introduce headers for C++-only wrapper classes as other related C++ modules
+ will need to interop with it.
+* Separate any initialization routines that depend on optional components into
+ its own module/dependency (currently, things like `registerAllDialects` fall
+ into this category).
+There are a lot of co-related issues of shared library linkage, distribution
+concerns, etc that affect such things. Organizing the code into composable
+modules (versus a monolithic `cpp` file) allows the flexibility to address many
+of these as needed over time. Also, compilation time for all of the template
+meta-programming in pybind scales with the number of things you define in a
+translation unit. Breaking into multiple translation units can significantly aid
+compile times for APIs with a large surface area.
+### Submodules
+Generally, the C++ codebase namespaces most things into the `mlir` namespace.
+However, in order to modularize and make the Python bindings easier to
+understand, sub-packages are defined that map roughly to the directory structure
+of functional units in MLIR.
+* `mlir.ir`
+* `mlir.passes` (`pass` is a reserved word :( )
+* `mlir.dialect`
+* `mlir.execution_engine` (aside from namespacing, it is important that
+ "bulky"/optional parts like this are isolated)
+In addition, initialization functions that imply optional dependencies should
+be in underscored (notionally private) modules such as `_init` and linked
+separately. This allows downstream integrators to completely customize what is
+included "in the box" and covers things like dialect registration,
+pass registration, etc.
+### Loader
+LLVM/MLIR is a non-trivial python-native project that is likely to co-exist with
+other non-trivial native extensions. As such, the native extension (i.e. the
+`.so`/`.pyd`/`.dylib`) is exported as a notionally private top-level symbol
+(`_mlir`), while a small set of Python code is provided in `mlir/__init__.py`
+and siblings which loads and re-exports it. This split provides a place to stage
+code that needs to prepare the environment *before* the shared library is loaded
+into the Python runtime, and also provides a place that one-time initialization
+code can be invoked apart from module constructors.
+To start with the `mlir/__init__.py` loader shim can be very simple and scale to
+future need:
+from _mlir import *
+### Limited use of globals
+For normal operations, parent-child constructor relationships are realized with
+constructor methods on a parent class as opposed to requiring
+invocation/creation from a global symbol.
+For example, consider two code fragments:
+op = build_my_op()
+region = mlir.Region(op)
+op = build_my_op()
+region = op.new_region()
+For tightly coupled data structures like `Operation`, the latter is generally
+preferred because:
+* It is syntactically less possible to create something that is going to access
+ illegal memory (less error handling in the bindings, less testing, etc).
+* It reduces the global-API surface area for creating related entities. This
+ makes it more likely that if constructing IR based on an Operation instance of
+ unknown providence, receiving code can just call methods on it to do what they
+ want versus needing to reach back into the global namespace and find the right
+ `Region` class.
+* It leaks fewer things that are in place for C++ convenience (i.e. default
+ constructors to invalid instances).
+### Use the C-API
+The Python APIs should seek to layer on top of the C-API to the degree possible.
+Especially for the core, dialect-independent parts, such a binding enables
+packaging decisions that would be
diff icult or impossible if spanning a C++ ABI
+boundary. In addition, factoring in this way side-steps some very
diff icult
+issues that arise when combining RTTI-based modules (which pybind derived things
+are) with non-RTTI polymorphic C++ code (the default compilation mode of LLVM).
+## Style
+In general, for the core parts of MLIR, the Python bindings should be largely
+isomorphic with the underlying C++ structures. However, concessions are made
+either for practicality or to give the resulting library an appropriately
+"Pythonic" flavor.
+### Properties vs get*() methods
+Generally favor converting trivial methods like `getContext()`, `getName()`,
+`isEntryBlock()`, etc to read-only Python properties (i.e. `context`). It is
+primarily a matter of calling `def_property_readonly` vs `def` in binding code,
+and makes things feel much nicer to the Python side.
+For example, prefer:
+m.def_property_readonly("context", ...)
+m.def("getContext", ...)
+### __repr__ methods
+Things that have nice printed representations are really great :) If there is a
+reasonable printed form, it can be a significant productivity boost to wire that
+to the `__repr__` method (and verify it with a [doctest](#sample-doctest)).
+### CamelCase vs snake_case
+Name functions/methods/properties in `snake_case` and classes in `CamelCase`. As
+a mechanical concession to Python style, this can go a long way to making the
+API feel like it fits in with its peers in the Python landscape.
+If in doubt, choose names that will flow properly with other
+[PEP 8 style names](https://pep8.org/#descriptive-naming-styles).
+### Prefer pseudo-containers
+Many core IR constructs provide methods directly on the instance to query count
+and begin/end iterators. Prefer hoisting these to dedicated pseudo containers.
+For example, a direct mapping of blocks within regions could be done this way:
+region = ...
+for block in region:
+ pass
+However, this way is preferred:
+region = ...
+for block in region.blocks:
+ pass
+Instead of leaking STL-derived identifiers (`front`, `back`, etc), translate
+them to appropriate `__dunder__` methods and iterator wrappers in the bindings.
+Note that this can be taken too far, so use good judgment. For example, block
+arguments may appear container-like but have defined methods for lookup and
+mutation that would be hard to model properly without making semantics
+complicated. If running into these, just mirror the C/C++ API.
+### Provide one stop helpers for common things
+One stop helpers that aggregate over multiple low level entities can be
+incredibly helpful and are encouraged within reason. For example, making
+`Context` have a `parse_asm` or equivalent that avoids needing to explicitly
+construct a SourceMgr can be quite nice. One stop helpers do not have to be
+mutually exclusive with a more complete mapping of the backing constructs.
+## Testing
+Tests should be added in the `test/Bindings/Python` directory and should
+typically be `.py` files that have a lit run line.
+While lit can run any python module, prefer to lay tests out according to these
+* For tests of the API surface area, prefer
+ [`doctest`](https://docs.python.org/3/library/doctest.html).
+* For generative tests (those that produce IR), define a Python module that
+ constructs/prints the IR and pipe it through `FileCheck`.
+* Parsing should be kept self-contained within the module under test by use of
+ raw constants and an appropriate `parse_asm` call.
+* Any file I/O code should be staged through a tempfile vs relying on file
+ artifacts/paths outside of the test module.
+### Sample Doctest
+# RUN: %PYTHON %s
+ >>> m = load_test_module()
+Test basics:
+ >>> m.operation.name
+ "module"
+ >>> m.operation.is_registered
+ True
+ >>> ... etc ...
+Verify that repr prints:
+ >>> m.operation
+ <operation 'module'>
+import mlir
+func @test_operation_correct_regions() {
+ // ...
+# TODO: Move to a test utility class once any of this actually exists.
+def load_test_module():
+ ctx = mlir.ir.Context()
+ ctx.allow_unregistered_dialects = True
+ module = ctx.parse_asm(TEST_MLIR_ASM)
+ return module
+if __name__ == "__main__":
+ import doctest
+ doctest.testmod()
+### Sample FileCheck test
+# RUN: %PYTHON %s | mlir-opt -split-input-file | FileCheck
+# TODO: Move to a test utility class once any of this actually exists.
+def print_module(f):
+ m = f()
+ print("// -----")
+ print("// TEST_FUNCTION:", f.__name__)
+ print(m.to_asm())
+ return f
+ at print_module
+def create_my_op():
+ m = mlir.ir.Module()
+ builder = m.new_op_builder()
+ # CHECK: mydialect.my_operation ...
+ builder.my_op()
+ return m
More information about the Mlir-commits
mailing list