[libc-commits] [libc] [libc] Rewrite code style docs (PR #133544)

Michael Jones via libc-commits libc-commits at lists.llvm.org
Thu Apr 3 16:33:09 PDT 2025


https://github.com/michaelrj-google updated https://github.com/llvm/llvm-project/pull/133544

>From 4c6c32b8aec7c4b0c038c26ad34633cc92bc4a88 Mon Sep 17 00:00:00 2001
From: Michael Jones <michaelrj at google.com>
Date: Fri, 28 Mar 2025 17:01:33 -0700
Subject: [PATCH 1/3] [libc] Rewrite code style docs

TODO: commit message
---
 libc/docs/dev/code_style_what.rst | 135 +++++++++++++++++++
 libc/docs/dev/code_style_why.rst  | 210 ++++++++++++++++++++++++++++++
 2 files changed, 345 insertions(+)
 create mode 100644 libc/docs/dev/code_style_what.rst
 create mode 100644 libc/docs/dev/code_style_why.rst

diff --git a/libc/docs/dev/code_style_what.rst b/libc/docs/dev/code_style_what.rst
new file mode 100644
index 0000000000000..d53c71406cae1
--- /dev/null
+++ b/libc/docs/dev/code_style_what.rst
@@ -0,0 +1,135 @@
+.. _code_style_what:
+
+========================================
+The LLVM-libc code style reference guide
+========================================
+
+This is intended as a quick reference guide to writing code within LLVM-libc.
+For the reasoning behind the rules, read :ref:`_code_style_why`.
+
+.. contents::
+   :depth: 2
+   :local:
+
+Overall style
+=============
+
+The LLVM-libc code style is based on the LLVM code style, with some
+LLVM-libc specific additions. See `the coding standards of the LLVM project
+<https://llvm.org/docs/CodingStandards.html>`_. for the LLVM code style. If
+there is a conflict between the LLVM code style and the LLVM-libc code style,
+the LLVM-libc code style takes precedence inside the LLVM-libc codebase.
+
+Naming style
+------------
+
+#. **Non-const variables** - Lowercase ``snake_case`` style.
+   .. TODO: Should we add the rule of "member variables end with an underscore?"
+#. **const and constexpr variables** - Capitalized ``SNAKE_CASE``.
+#. **Functions and methods** - ``snake_case``.
+#. **Internal class/struct names** - ``CaptilizedCamelCase``.
+#. **Public names** - Follow the style as prescribed by the standards.
+
+Macro style
+-----------
+
+#. Build flags - start with ``LIBC_COPT_``
+   * e.g. ``LIBC_COPT_PRINTF_DISABLE_FLOAT``
+#. Code defined flags - Defined in ``src/__support/macros`` and start with ``LIBC_``
+
+   * ``src/__support/macros/properties/`` - defines flags based on the build properties.
+
+      * ``architectures.h`` - Target architecture properties.
+        e.g., ``LIBC_TARGET_ARCH_IS_ARM``.
+      * ``compiler.h`` - Host compiler properties.
+        e.g., ``LIBC_COMPILER_IS_CLANG``.
+      * ``cpu_features.h`` - Target cpu feature availability.
+        e.g., ``LIBC_TARGET_CPU_HAS_AVX2``.
+      * ``types.h`` - Type properties and availability.
+        e.g., ``LIBC_TYPES_HAS_FLOAT128``.
+      * ``os.h`` - Target os properties.
+        e.g., ``LIBC_TARGET_OS_IS_LINUX``.
+
+  * ``src/__support/macros/config.h`` - Important compiler and platform
+     features. Such macros can be used to produce portable code by
+     parameterizing compilation based on the presence or lack of a given
+     feature. e.g., ``LIBC_HAS_FEATURE``
+  * ``src/__support/macros/attributes.h`` - Attributes for functions, types,
+    and variables. e.g., ``LIBC_UNUSED``
+  * ``src/__support/macros/optimization.h`` - Portable macros for performance
+    optimization. e.g., ``LIBC_LIKELY``, ``LIBC_LOOP_NOUNROLL``
+
+LLVM-libc specific rules
+========================
+
+#. Avoid using any C++ feature that requires runtime support.
+
+   * Examples include exceptions, virtual functions, static/global constructors,
+     and anything to do with RTTI.
+   * For C++ library features that don't require runtime support, see
+     ``/src/__support/CPP/`` which may already provide an implementation.
+
+#. Avoid calling public libc functions (even through the ``LIBC_NAMESPACE``) from
+   within LLVM-libc.
+
+   * Instead call the internal implementations for those functions.
+   * There are some specific exceptions, such as the allocation functions.
+
+#. Avoid including public libc headers directly from within LLVM-libc.
+
+   * Instead use specific headers for types or macros needed from the ``hdr``
+     directory (called "Proxy Headers")
+   * Some headers are always provided by the compiler, and are allowed but
+     discouraged.
+     * e.g. ``stdint.h``
+
+   .. TODO: add a doc on proxy headers.
+
+#. Avoid including from ``/include`` directly.
+
+   * Instead use the Proxy Headers in the ``hdr`` directory.
+   * It is allowed, though discouraged, to use headers from ``/include`` for
+     types or macros that are LLVM-libc specific and only used in fullbuild.
+     Proxy headers are preferred.
+
+#. Avoid setting ``errno`` in internal functions.
+
+   * Instead return errors with ``cpp::ErrorOr``, ``cpp::optional``, or a custom
+     struct.
+   * The only time to set ``errno`` is in the public entrypoints, preferably
+     immediately before returning.
+
+#. Avoid writing code in assembly for performance.
+
+   * Instead write C++ code that the compiler can reason about and optimize.
+   * Builtins are allowed where relevant, but may need backup code.
+   * If optimal performance is required, it is recommended to improve the
+     compiler.
+
+#. Mark all functions in headers with ``LIBC_INLINE``
+
+   * This is to avoid ODR violations, but also to allow adding properties to
+     the function.
+     .. TODO: Fix this phrasing.
+
+#. Use ``LIBC_ASSERT`` for runtime assertions.
+
+   * There are some files where this is impossible due to ``libc_assert.h``
+     depending on that file. These just can't use assertions.
+     .. TODO: list which files can't use assertions. Also see if we can fix this.
+
+#. Avoid allocating memory where possible.
+
+   * Instead use statically allocated memory when possible, even if the size
+     must be large.
+   * Some functions require allocation, such as ``strdup``, these should
+     allocate as defined by the standard.
+
+
+#. Avoid calling the public name of libc functions from unit tests.
+
+   * Instead call the namespaced name of the function (e.g.
+     ``LIBC_NAMESPACE::sqrt`` instead of ``sqrt``).
+   * For tests of public header macros or integration tests, it may be necessary
+     to call the public name of the function. These should be in the
+     ``test/include`` directory or the ``test/integration`` directory.
diff --git a/libc/docs/dev/code_style_why.rst b/libc/docs/dev/code_style_why.rst
new file mode 100644
index 0000000000000..431d1f93a7d40
--- /dev/null
+++ b/libc/docs/dev/code_style_why.rst
@@ -0,0 +1,210 @@
+.. _code_style_why:
+
+==================================
+The LLVM-libc code style explainer
+==================================
+
+This is the reasoning behind the rules in :ref:`_code_style_what`. It's intended
+to help you understand why the rules exist and what they are trying to achieve.
+
+.. contents::
+   :depth: 2
+   :local:
+
+Overall style
+=============
+
+The LLVM-libc code style is based on the LLVM code style, with some
+LLVM-libc specific additions. See `the coding standards of the LLVM project
+<https://llvm.org/docs/CodingStandards.html>`_. for the LLVM code style. If
+there is a conflict between the LLVM code style and the LLVM-libc code style,
+the LLVM-libc code style takes precedence inside the LLVM-libc codebase.
+
+Why not just follow the LLVM code style?
+----------------------------------------
+
+The LLVM code style is ideal for a modern C++ codebase that can take advantage
+of the C++ library features. LLVM-libc is also written in C++, but it cannot
+depend on any other library, nor can it use any C++ features that require
+runtime support. Finally, there are many function names that are provided
+by the standard but don't match any code style. In the end, the library code
+must somehow bridge the gap between the inconsistent standard and the LLVM
+code style. The result is somewhat inconsistent in its own way, but the goal
+is to make the code consistent enough to be readable.
+
+
+Naming style
+------------
+
+Most of this was down to personal preference when writing the clang-tidy files.
+
+#. **Non-const variables** - This includes function arguments, struct and
+   class data members, non-const globals and local variables. They all use the
+   ``snake_case`` style.
+   .. TODO: Should we add the rule of "member variables end with an underscore?"
+#. **const and constexpr variables** - They use the capitalized
+   ``SNAKE_CASE`` irrespective of whether they are local or global.
+#. **Functions and methods** - They use the ``snake_case`` style like the
+   non-const variables.
+#. **Internal class/struct names** - These are types which are internal to the
+   libc implementation. They use the ``CaptilizedCamelCase`` style.
+#. **Public names** - These are the names as prescribed by the standards and
+   will follow the style as prescribed by the standards.
+
+Macro Style
+-----------
+
+There are two main styles for macros:
+
+#. **Build defined** macros are generated by `CMake` or `Bazel` and are passed
+   down to the compiler with the ``-D`` command line flag. They start with the
+   ``LIBC_COPT_`` prefix. They are used to tune the behavior of the libc.
+   * e.g. ``LIBC_COPT_PRINTF_DISABLE_FLOAT``
+   * These should be based on the ``LIBC_CONF_`` variables in ``config/config.json``.
+
+#. **Code defined** macros are defined within the ``src/__support/macros``
+   folder. They all start with the ``LIBC_`` prefix.
+
+   * ``src/__support/macros/properties/`` - Build related properties like
+     target architecture or enabled CPU features defined by introspecting
+     compiler defined preprocessor definitions.
+
+     * ``architectures.h`` - Target architecture properties.
+       e.g., ``LIBC_TARGET_ARCH_IS_ARM``.
+     * ``compiler.h`` - Host compiler properties.
+       e.g., ``LIBC_COMPILER_IS_CLANG``.
+     * ``cpu_features.h`` - Target cpu feature availability.
+       e.g., ``LIBC_TARGET_CPU_HAS_AVX2``.
+     * ``types.h`` - Type properties and availability.
+       e.g., ``LIBC_TYPES_HAS_FLOAT128``.
+     * ``os.h`` - Target os properties.
+       e.g., ``LIBC_TARGET_OS_IS_LINUX``.
+
+   * ``src/__support/macros/config.h`` - Important compiler and platform
+     features. Such macros can be used to produce portable code by
+     parameterizing compilation based on the presence or lack of a given
+     feature. e.g., ``LIBC_HAS_FEATURE``
+   * ``src/__support/macros/attributes.h`` - Attributes for functions, types,
+     and variables. e.g., ``LIBC_UNUSED``
+   * ``src/__support/macros/optimization.h`` - Portable macros for performance
+     optimization. e.g., ``LIBC_LIKELY``, ``LIBC_LOOP_NOUNROLL``
+
+
+LLVM-libc specific rules
+========================
+
+#. Avoid using any C++ feature that requires runtime support.
+
+   * Examples include exceptions, virtual functions, static/global constructors,
+     and anything to do with RTTI.
+   * For C++ library features that don't require runtime support, see
+     ``/src/__support/CPP/`` which may already provide an implementation.
+   * The reason is that the C++ runtime depends on libc, so any use of it would
+     create a circular dependency.
+
+#. Avoid calling public libc functions (even through the ``LIBC_NAMESPACE``) from
+   within LLVM-libc.
+
+   * Instead call the internal implementations for those functions.
+   * There are some specific exceptions, such as the allocation functions.
+   * The reason is for independence of entrypoints as well as hermeticity.
+      * Adding any one entrypoint should not require adding any other.
+      * Calling the public name of a libc function is likely to pull in the
+        system's libc.
+
+#. Avoid including public libc headers directly from within LLVM-libc.
+
+   * Instead use specific headers for types or macros needed from the ``hdr``
+     directory (called "Proxy Headers")
+   * Some headers are always provided by the compiler, and are allowed but
+     discouraged.
+     * e.g. ``stdint.h``
+   * The reason is to enforce hermeticity for fullbuild, and centralize the
+     inclusion of public headers in case the includes need extra conditions.
+
+   .. TODO: add a doc on proxy headers.
+
+#. Avoid including from ``/include`` directly.
+
+   * Instead use the Proxy Headers in the ``hdr`` directory.
+   * It is allowed, though discouraged, to use headers from ``/include`` for
+     types or macros that are LLVM-libc specific and only used in fullbuild.
+     Proxy headers are preferred.
+   * The reason is to avoid type conflicts with system headers. If a type is
+     defined in a system header, including our definition may cause a type
+     conflict. This can be a problem even in files that don't also include
+     system headers due to transitive includes.
+
+#. Avoid setting ``errno`` in internal functions.
+
+   * Instead return errors with ``cpp::ErrorOr``, ``cpp::optional``, or a custom
+     struct.
+   * The only time to set ``errno`` is in the public entrypoints, preferably
+     immediately before returning.
+   * This simplifies reusing code between entrypoints, for example printf reuses
+     the string to integer code for its format string parsing but handles errors
+     such as overflow very differently than strtol.
+
+#. Avoid writing code in assembly for performance.
+
+   * Instead write C++ code that the compiler can reason about and optimize.
+   * Builtins are allowed where relevant, but may need backup code.
+   * If optimal performance is required, it is recommended to improve the
+     compiler.
+   * This is by far the most controvertial rule, but also one of the most
+     important. Many people believe they can write faster code in assembly,
+     and they may be correct that the hand-optimized code shows faster in some
+     benchamarks, but this sort of micro-optimization is highly CPU specific,
+     and may not even be better in real world use. More importantly, assembly is
+     not available for the compiler to transform. Many large optimizations come
+     not from making one function faster, but from making the compiler able to
+     reason about the inputs and outputs of the function so it can hoist, lower,
+     or even eliminate the function entirely.
+   * The best example of this is the ctype functions, impelemnted in
+     ``src/__support/ctype_utils.h``. These are implemented entirely as switch
+     statements. This is not a clean or elegant implementation, but it makes it
+     trivial for the compiler to map inputs to outputs. If you look at the
+     disassembly of the entrypoints that call these functions, you can see that
+     the compiler can compose, inline, and generally optimize the switch
+     statements in ways not possible with assembly.
+   * Next, avoiding assembly improves portability. Most of the LLVM-libc
+     code runs just as well on platforms as different as RISC-V and GPUs. Since
+     it's in C++, it can be compiled with different options for different
+     tradeoffs. For x86_64 linux, maybe the user wants -O3, but for a RISC-V
+     baremetal target, the user wants -Os. By leaving the assembly to the
+     compiler, the user can choose what they would like to optimize for.
+   * Finally, humans are bad at writing assembly in general. The number of
+     options and tradeoffs is so large that creating a program (the compiler)
+     to perform the optimizations for us is much more efficient. There are
+     places where the compiler misses optimizations, to be sure, but many more
+     where it finds optimizations that humans would never imagine. By fixing
+     those problem and improving the compiler, it improves not just our code,
+     but everyone's code.
+
+#. Mark all functions in headers with ``LIBC_INLINE``
+
+   * This is to avoid ODR violations, but also to allow adding properties to
+     the function.
+     .. TODO: Fix this phrasing.
+
+#. Use ``LIBC_ASSERT`` for runtime assertions.
+
+   * There are some files where this is impossible due to ``libc_assert.h``
+     depending on that file. These just can't use assertions.
+     .. TODO: list which files can't use assertions. Also see if we can fix this.
+
+#. Avoid allocating memory where possible.
+
+   * Instead use statically allocated memory when possible, even if the size
+     must be large.
+   * Some functions require allocation, such as ``strdup``, these should
+     allocate as defined by the standard.
+
+
+#. Avoid calling the public name of libc functions from unit tests.
+
+   * Instead call the namespaced name of the function (e.g.
+     ``LIBC_NAMESPACE::sqrt`` instead of ``sqrt``).
+   * For tests of public header macros or integration tests, it may be necessary
+     to call the public name of the function. These should be in the
+     ``test/include`` directory or the ``test/integration`` directory.

>From eb4b34b3da89e9e02b02ae380514becd7b861a41 Mon Sep 17 00:00:00 2001
From: Michael Jones <michaelrj at google.com>
Date: Tue, 1 Apr 2025 16:23:52 -0700
Subject: [PATCH 2/3] more why explanation

---
 libc/docs/dev/code_style_why.rst | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/libc/docs/dev/code_style_why.rst b/libc/docs/dev/code_style_why.rst
index 431d1f93a7d40..758821376e690 100644
--- a/libc/docs/dev/code_style_why.rst
+++ b/libc/docs/dev/code_style_why.rst
@@ -177,15 +177,25 @@ LLVM-libc specific rules
      options and tradeoffs is so large that creating a program (the compiler)
      to perform the optimizations for us is much more efficient. There are
      places where the compiler misses optimizations, to be sure, but many more
-     where it finds optimizations that humans would never imagine. By fixing
-     those problem and improving the compiler, it improves not just our code,
+     where it finds optimizations that humans could never imagine. By fixing
+     those problems and improving the compiler, it improves not just our code,
      but everyone's code.
 
 #. Mark all functions in headers with ``LIBC_INLINE``
 
    * This is to avoid ODR violations, but also to allow adding properties to
      the function.
-     .. TODO: Fix this phrasing.
+   * Functions in headers effectively get "defined" by every file that includes
+     the header. This is a problem for C++'s One Definition Rule (ODR), which
+     says that only one copy of a function should exist. In practice, this means
+     link errors when linking two cpp files that include the same header. By
+     marking a function ``inline`` it tells the compiler to remove the
+     function's public definition and move it into the caller (inlining it).
+     This means the inlined function is gone by link time, so there's no
+     conflict.
+   * Normally marking functions in headers ``inline`` is sufficient, but there
+     are cases where we want to add attributes to the inline functions. The
+     macro lets us do that by redefining the ``LIBC_INLINE`` macro.
 
 #. Use ``LIBC_ASSERT`` for runtime assertions.
 

>From c5ef3dbdd60e2ff19f91435720bd9ec112e1354d Mon Sep 17 00:00:00 2001
From: Michael Jones <michaelrj at google.com>
Date: Thu, 3 Apr 2025 16:32:41 -0700
Subject: [PATCH 3/3] finished first draft of why. Need to add rest of
 code_style

---
 libc/docs/dev/code_style_why.rst | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/libc/docs/dev/code_style_why.rst b/libc/docs/dev/code_style_why.rst
index 758821376e690..5e7dcf41f8921 100644
--- a/libc/docs/dev/code_style_why.rst
+++ b/libc/docs/dev/code_style_why.rst
@@ -201,6 +201,8 @@ LLVM-libc specific rules
 
    * There are some files where this is impossible due to ``libc_assert.h``
      depending on that file. These just can't use assertions.
+   * The ``LIBC_ASSERT`` interface allow using assertions without calling the
+     public ``assert`` function.
      .. TODO: list which files can't use assertions. Also see if we can fix this.
 
 #. Avoid allocating memory where possible.
@@ -209,6 +211,12 @@ LLVM-libc specific rules
      must be large.
    * Some functions require allocation, such as ``strdup``, these should
      allocate as defined by the standard.
+   * This is more of an opinion than a hard rule. In general, libc functions can
+     be implemented without allocation. Doing so makes them behave more
+     predictably in unusual situations, such as on memory constrained systems or
+     during a program crash. For functions like printf this can be very
+     important, consider the importance of logging before a crashing program
+     exits.
 
 
 #. Avoid calling the public name of libc functions from unit tests.
@@ -218,3 +226,6 @@ LLVM-libc specific rules
    * For tests of public header macros or integration tests, it may be necessary
      to call the public name of the function. These should be in the
      ``test/include`` directory or the ``test/integration`` directory.
+   * This ensures that the function being tested is always the one provided by
+     LLVM-libc. It's less useful for the test to confirm that the system's libc
+     is functional.



More information about the libc-commits mailing list