[llvm-branch-commits] [clang] [clang] "modular_format" attribute for functions using format strings (PR #147431)
Daniel Thornburgh via llvm-branch-commits
llvm-branch-commits at lists.llvm.org
Tue Jul 22 11:29:25 PDT 2025
================
@@ -9427,3 +9427,37 @@ diagnostics with code like:
__attribute__((nonstring)) char NotAStr[3] = "foo"; // Not diagnosed
}];
}
+
+def ModularFormatDocs : Documentation {
+ let Category = DocCatFunction;
+ let Content = [{
+The ``modular_format`` attribute can be applied to a function that bears the
+``format`` attribute (or standard library functions) to indicate that the
+implementation is modular on the format string argument. When the format string
+for a given call is constant, the compiler may redirect the call to the symbol
+given as the first argument to the attribute (the modular implementation
+function).
+
+The second argument is a implementation name, and the remaining arguments are
+aspects of the format string for the compiler to report. If the compiler does
+not understand a aspect, it must summarily report that the format string has
+that aspect.
+
+The compiler reports an aspect by issing a relocation for the symbol
+``<impl_name>_<aspect>``. This arranges for code and data needed to support the
+aspect of the implementation to be brought into the link to satisfy weak
+references in the modular implemenation function.
+
+For example, say ``printf`` is annotated with
+``modular_format(__modular_printf, __printf, float)``. Then, a call to
+``printf(var, 42)`` would be untouched. A call to ``printf("%d", 42)`` would
+become a call to ``__modular_printf`` with the same arguments, as would
----------------
mysterymath wrote:
> My concern is more about dispatching in ways the user may not anticipate and getting observably different behavior. e.g., the user calls `printf("%I64d", 0LL)` and they were getting the MSVC CRT `printf` call which supported that modifier but now calls `__modular_printf` which doesn't know about the modifier. What happens in that kind of situation?
Ah, if I understand what you're getting at, that can't happen; it's explicitly out of scope for the feature.
The `modular_format` attribute exists to advertise to compiler that is compiling calls to a function that the implementation can be split by redirecting calls and emitting relocs to various symbols. The only plausible mechanism to do so would be a header file, and that means that the header would need to be provided by and intrinsically tied to a specific version of the implementation. Otherwise, it would be impossible to determine what aspects the implementation requires to be emitted to function correctly.
Accordingly, this feature would primarily be useful for cases where libc is statically linked in and paired with its own headers. (llvm-libc, various embedded libcs, etc.) I suppose it's technically possible to break out printf implementation parts into a family of individual dynamic libraries, but even then, any libc header set that required that the libc implementation be dynamically replaceable would not be able to include `modular_format`.
So, for implementations that use this feature, `printf` and `__modular_printf` would always be designed together. To avoid ever introducing two full `printf` implementations into the link, `printf` would be a thin wrapper around `__modular_printf` that also requests every possible aspect of the implementation. This would mean that the two could never diverge.
As an aside, this is my first time landing a RFC across so many components of LLVM. I wasn't sure how much detail to include in each change; my intuition was to try to provide links to the RFC instead. I don't want the above reasoning to get buried, and it gives me pause that it wasn't readily accessible. But I'm also not entirely sure where it should live going forward. Advice would be appreciated.
https://github.com/llvm/llvm-project/pull/147431
More information about the llvm-branch-commits
mailing list