[libc-commits] [libc] [libc] Rewrite code style docs (PR #133544)
Michael Jones via libc-commits
libc-commits at lists.llvm.org
Thu Apr 3 16:33:09 PDT 2025
https://github.com/michaelrj-google updated https://github.com/llvm/llvm-project/pull/133544
>From 4c6c32b8aec7c4b0c038c26ad34633cc92bc4a88 Mon Sep 17 00:00:00 2001
From: Michael Jones <michaelrj at google.com>
Date: Fri, 28 Mar 2025 17:01:33 -0700
Subject: [PATCH 1/3] [libc] Rewrite code style docs
TODO: commit message
---
libc/docs/dev/code_style_what.rst | 135 +++++++++++++++++++
libc/docs/dev/code_style_why.rst | 210 ++++++++++++++++++++++++++++++
2 files changed, 345 insertions(+)
create mode 100644 libc/docs/dev/code_style_what.rst
create mode 100644 libc/docs/dev/code_style_why.rst
diff --git a/libc/docs/dev/code_style_what.rst b/libc/docs/dev/code_style_what.rst
new file mode 100644
index 0000000000000..d53c71406cae1
--- /dev/null
+++ b/libc/docs/dev/code_style_what.rst
@@ -0,0 +1,135 @@
+.. _code_style_what:
+
+========================================
+The LLVM-libc code style reference guide
+========================================
+
+This is intended as a quick reference guide to writing code within LLVM-libc.
+For the reasoning behind the rules, read :ref:`_code_style_why`.
+
+.. contents::
+ :depth: 2
+ :local:
+
+Overall style
+=============
+
+The LLVM-libc code style is based on the LLVM code style, with some
+LLVM-libc specific additions. See `the coding standards of the LLVM project
+<https://llvm.org/docs/CodingStandards.html>`_. for the LLVM code style. If
+there is a conflict between the LLVM code style and the LLVM-libc code style,
+the LLVM-libc code style takes precedence inside the LLVM-libc codebase.
+
+Naming style
+------------
+
+#. **Non-const variables** - Lowercase ``snake_case`` style.
+ .. TODO: Should we add the rule of "member variables end with an underscore?"
+#. **const and constexpr variables** - Capitalized ``SNAKE_CASE``.
+#. **Functions and methods** - ``snake_case``.
+#. **Internal class/struct names** - ``CaptilizedCamelCase``.
+#. **Public names** - Follow the style as prescribed by the standards.
+
+Macro style
+-----------
+
+#. Build flags - start with ``LIBC_COPT_``
+ * e.g. ``LIBC_COPT_PRINTF_DISABLE_FLOAT``
+#. Code defined flags - Defined in ``src/__support/macros`` and start with ``LIBC_``
+
+ * ``src/__support/macros/properties/`` - defines flags based on the build properties.
+
+ * ``architectures.h`` - Target architecture properties.
+ e.g., ``LIBC_TARGET_ARCH_IS_ARM``.
+ * ``compiler.h`` - Host compiler properties.
+ e.g., ``LIBC_COMPILER_IS_CLANG``.
+ * ``cpu_features.h`` - Target cpu feature availability.
+ e.g., ``LIBC_TARGET_CPU_HAS_AVX2``.
+ * ``types.h`` - Type properties and availability.
+ e.g., ``LIBC_TYPES_HAS_FLOAT128``.
+ * ``os.h`` - Target os properties.
+ e.g., ``LIBC_TARGET_OS_IS_LINUX``.
+
+ * ``src/__support/macros/config.h`` - Important compiler and platform
+ features. Such macros can be used to produce portable code by
+ parameterizing compilation based on the presence or lack of a given
+ feature. e.g., ``LIBC_HAS_FEATURE``
+ * ``src/__support/macros/attributes.h`` - Attributes for functions, types,
+ and variables. e.g., ``LIBC_UNUSED``
+ * ``src/__support/macros/optimization.h`` - Portable macros for performance
+ optimization. e.g., ``LIBC_LIKELY``, ``LIBC_LOOP_NOUNROLL``
+
+LLVM-libc specific rules
+========================
+
+#. Avoid using any C++ feature that requires runtime support.
+
+ * Examples include exceptions, virtual functions, static/global constructors,
+ and anything to do with RTTI.
+ * For C++ library features that don't require runtime support, see
+ ``/src/__support/CPP/`` which may already provide an implementation.
+
+#. Avoid calling public libc functions (even through the ``LIBC_NAMESPACE``) from
+ within LLVM-libc.
+
+ * Instead call the internal implementations for those functions.
+ * There are some specific exceptions, such as the allocation functions.
+
+#. Avoid including public libc headers directly from within LLVM-libc.
+
+ * Instead use specific headers for types or macros needed from the ``hdr``
+ directory (called "Proxy Headers")
+ * Some headers are always provided by the compiler, and are allowed but
+ discouraged.
+ * e.g. ``stdint.h``
+
+ .. TODO: add a doc on proxy headers.
+
+#. Avoid including from ``/include`` directly.
+
+ * Instead use the Proxy Headers in the ``hdr`` directory.
+ * It is allowed, though discouraged, to use headers from ``/include`` for
+ types or macros that are LLVM-libc specific and only used in fullbuild.
+ Proxy headers are preferred.
+
+#. Avoid setting ``errno`` in internal functions.
+
+ * Instead return errors with ``cpp::ErrorOr``, ``cpp::optional``, or a custom
+ struct.
+ * The only time to set ``errno`` is in the public entrypoints, preferably
+ immediately before returning.
+
+#. Avoid writing code in assembly for performance.
+
+ * Instead write C++ code that the compiler can reason about and optimize.
+ * Builtins are allowed where relevant, but may need backup code.
+ * If optimal performance is required, it is recommended to improve the
+ compiler.
+
+#. Mark all functions in headers with ``LIBC_INLINE``
+
+ * This is to avoid ODR violations, but also to allow adding properties to
+ the function.
+ .. TODO: Fix this phrasing.
+
+#. Use ``LIBC_ASSERT`` for runtime assertions.
+
+ * There are some files where this is impossible due to ``libc_assert.h``
+ depending on that file. These just can't use assertions.
+ .. TODO: list which files can't use assertions. Also see if we can fix this.
+
+#. Avoid allocating memory where possible.
+
+ * Instead use statically allocated memory when possible, even if the size
+ must be large.
+ * Some functions require allocation, such as ``strdup``, these should
+ allocate as defined by the standard.
+
+
+#. Avoid calling the public name of libc functions from unit tests.
+
+ * Instead call the namespaced name of the function (e.g.
+ ``LIBC_NAMESPACE::sqrt`` instead of ``sqrt``).
+ * For tests of public header macros or integration tests, it may be necessary
+ to call the public name of the function. These should be in the
+ ``test/include`` directory or the ``test/integration`` directory.
diff --git a/libc/docs/dev/code_style_why.rst b/libc/docs/dev/code_style_why.rst
new file mode 100644
index 0000000000000..431d1f93a7d40
--- /dev/null
+++ b/libc/docs/dev/code_style_why.rst
@@ -0,0 +1,210 @@
+.. _code_style_why:
+
+==================================
+The LLVM-libc code style explainer
+==================================
+
+This is the reasoning behind the rules in :ref:`_code_style_what`. It's intended
+to help you understand why the rules exist and what they are trying to achieve.
+
+.. contents::
+ :depth: 2
+ :local:
+
+Overall style
+=============
+
+The LLVM-libc code style is based on the LLVM code style, with some
+LLVM-libc specific additions. See `the coding standards of the LLVM project
+<https://llvm.org/docs/CodingStandards.html>`_. for the LLVM code style. If
+there is a conflict between the LLVM code style and the LLVM-libc code style,
+the LLVM-libc code style takes precedence inside the LLVM-libc codebase.
+
+Why not just follow the LLVM code style?
+----------------------------------------
+
+The LLVM code style is ideal for a modern C++ codebase that can take advantage
+of the C++ library features. LLVM-libc is also written in C++, but it cannot
+depend on any other library, nor can it use any C++ features that require
+runtime support. Finally, there are many function names that are provided
+by the standard but don't match any code style. In the end, the library code
+must somehow bridge the gap between the inconsistent standard and the LLVM
+code style. The result is somewhat inconsistent in its own way, but the goal
+is to make the code consistent enough to be readable.
+
+
+Naming style
+------------
+
+Most of this was down to personal preference when writing the clang-tidy files.
+
+#. **Non-const variables** - This includes function arguments, struct and
+ class data members, non-const globals and local variables. They all use the
+ ``snake_case`` style.
+ .. TODO: Should we add the rule of "member variables end with an underscore?"
+#. **const and constexpr variables** - They use the capitalized
+ ``SNAKE_CASE`` irrespective of whether they are local or global.
+#. **Functions and methods** - They use the ``snake_case`` style like the
+ non-const variables.
+#. **Internal class/struct names** - These are types which are internal to the
+ libc implementation. They use the ``CaptilizedCamelCase`` style.
+#. **Public names** - These are the names as prescribed by the standards and
+ will follow the style as prescribed by the standards.
+
+Macro Style
+-----------
+
+There are two main styles for macros:
+
+#. **Build defined** macros are generated by `CMake` or `Bazel` and are passed
+ down to the compiler with the ``-D`` command line flag. They start with the
+ ``LIBC_COPT_`` prefix. They are used to tune the behavior of the libc.
+ * e.g. ``LIBC_COPT_PRINTF_DISABLE_FLOAT``
+ * These should be based on the ``LIBC_CONF_`` variables in ``config/config.json``.
+
+#. **Code defined** macros are defined within the ``src/__support/macros``
+ folder. They all start with the ``LIBC_`` prefix.
+
+ * ``src/__support/macros/properties/`` - Build related properties like
+ target architecture or enabled CPU features defined by introspecting
+ compiler defined preprocessor definitions.
+
+ * ``architectures.h`` - Target architecture properties.
+ e.g., ``LIBC_TARGET_ARCH_IS_ARM``.
+ * ``compiler.h`` - Host compiler properties.
+ e.g., ``LIBC_COMPILER_IS_CLANG``.
+ * ``cpu_features.h`` - Target cpu feature availability.
+ e.g., ``LIBC_TARGET_CPU_HAS_AVX2``.
+ * ``types.h`` - Type properties and availability.
+ e.g., ``LIBC_TYPES_HAS_FLOAT128``.
+ * ``os.h`` - Target os properties.
+ e.g., ``LIBC_TARGET_OS_IS_LINUX``.
+
+ * ``src/__support/macros/config.h`` - Important compiler and platform
+ features. Such macros can be used to produce portable code by
+ parameterizing compilation based on the presence or lack of a given
+ feature. e.g., ``LIBC_HAS_FEATURE``
+ * ``src/__support/macros/attributes.h`` - Attributes for functions, types,
+ and variables. e.g., ``LIBC_UNUSED``
+ * ``src/__support/macros/optimization.h`` - Portable macros for performance
+ optimization. e.g., ``LIBC_LIKELY``, ``LIBC_LOOP_NOUNROLL``
+
+
+LLVM-libc specific rules
+========================
+
+#. Avoid using any C++ feature that requires runtime support.
+
+ * Examples include exceptions, virtual functions, static/global constructors,
+ and anything to do with RTTI.
+ * For C++ library features that don't require runtime support, see
+ ``/src/__support/CPP/`` which may already provide an implementation.
+ * The reason is that the C++ runtime depends on libc, so any use of it would
+ create a circular dependency.
+
+#. Avoid calling public libc functions (even through the ``LIBC_NAMESPACE``) from
+ within LLVM-libc.
+
+ * Instead call the internal implementations for those functions.
+ * There are some specific exceptions, such as the allocation functions.
+ * The reason is for independence of entrypoints as well as hermeticity.
+ * Adding any one entrypoint should not require adding any other.
+ * Calling the public name of a libc function is likely to pull in the
+ system's libc.
+
+#. Avoid including public libc headers directly from within LLVM-libc.
+
+ * Instead use specific headers for types or macros needed from the ``hdr``
+ directory (called "Proxy Headers")
+ * Some headers are always provided by the compiler, and are allowed but
+ discouraged.
+ * e.g. ``stdint.h``
+ * The reason is to enforce hermeticity for fullbuild, and centralize the
+ inclusion of public headers in case the includes need extra conditions.
+
+ .. TODO: add a doc on proxy headers.
+
+#. Avoid including from ``/include`` directly.
+
+ * Instead use the Proxy Headers in the ``hdr`` directory.
+ * It is allowed, though discouraged, to use headers from ``/include`` for
+ types or macros that are LLVM-libc specific and only used in fullbuild.
+ Proxy headers are preferred.
+ * The reason is to avoid type conflicts with system headers. If a type is
+ defined in a system header, including our definition may cause a type
+ conflict. This can be a problem even in files that don't also include
+ system headers due to transitive includes.
+
+#. Avoid setting ``errno`` in internal functions.
+
+ * Instead return errors with ``cpp::ErrorOr``, ``cpp::optional``, or a custom
+ struct.
+ * The only time to set ``errno`` is in the public entrypoints, preferably
+ immediately before returning.
+ * This simplifies reusing code between entrypoints, for example printf reuses
+ the string to integer code for its format string parsing but handles errors
+ such as overflow very differently than strtol.
+
+#. Avoid writing code in assembly for performance.
+
+ * Instead write C++ code that the compiler can reason about and optimize.
+ * Builtins are allowed where relevant, but may need backup code.
+ * If optimal performance is required, it is recommended to improve the
+ compiler.
+ * This is by far the most controvertial rule, but also one of the most
+ important. Many people believe they can write faster code in assembly,
+ and they may be correct that the hand-optimized code shows faster in some
+ benchamarks, but this sort of micro-optimization is highly CPU specific,
+ and may not even be better in real world use. More importantly, assembly is
+ not available for the compiler to transform. Many large optimizations come
+ not from making one function faster, but from making the compiler able to
+ reason about the inputs and outputs of the function so it can hoist, lower,
+ or even eliminate the function entirely.
+ * The best example of this is the ctype functions, impelemnted in
+ ``src/__support/ctype_utils.h``. These are implemented entirely as switch
+ statements. This is not a clean or elegant implementation, but it makes it
+ trivial for the compiler to map inputs to outputs. If you look at the
+ disassembly of the entrypoints that call these functions, you can see that
+ the compiler can compose, inline, and generally optimize the switch
+ statements in ways not possible with assembly.
+ * Next, avoiding assembly improves portability. Most of the LLVM-libc
+ code runs just as well on platforms as different as RISC-V and GPUs. Since
+ it's in C++, it can be compiled with different options for different
+ tradeoffs. For x86_64 linux, maybe the user wants -O3, but for a RISC-V
+ baremetal target, the user wants -Os. By leaving the assembly to the
+ compiler, the user can choose what they would like to optimize for.
+ * Finally, humans are bad at writing assembly in general. The number of
+ options and tradeoffs is so large that creating a program (the compiler)
+ to perform the optimizations for us is much more efficient. There are
+ places where the compiler misses optimizations, to be sure, but many more
+ where it finds optimizations that humans would never imagine. By fixing
+ those problem and improving the compiler, it improves not just our code,
+ but everyone's code.
+
+#. Mark all functions in headers with ``LIBC_INLINE``
+
+ * This is to avoid ODR violations, but also to allow adding properties to
+ the function.
+ .. TODO: Fix this phrasing.
+
+#. Use ``LIBC_ASSERT`` for runtime assertions.
+
+ * There are some files where this is impossible due to ``libc_assert.h``
+ depending on that file. These just can't use assertions.
+ .. TODO: list which files can't use assertions. Also see if we can fix this.
+
+#. Avoid allocating memory where possible.
+
+ * Instead use statically allocated memory when possible, even if the size
+ must be large.
+ * Some functions require allocation, such as ``strdup``, these should
+ allocate as defined by the standard.
+
+
+#. Avoid calling the public name of libc functions from unit tests.
+
+ * Instead call the namespaced name of the function (e.g.
+ ``LIBC_NAMESPACE::sqrt`` instead of ``sqrt``).
+ * For tests of public header macros or integration tests, it may be necessary
+ to call the public name of the function. These should be in the
+ ``test/include`` directory or the ``test/integration`` directory.
>From eb4b34b3da89e9e02b02ae380514becd7b861a41 Mon Sep 17 00:00:00 2001
From: Michael Jones <michaelrj at google.com>
Date: Tue, 1 Apr 2025 16:23:52 -0700
Subject: [PATCH 2/3] more why explanation
---
libc/docs/dev/code_style_why.rst | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/libc/docs/dev/code_style_why.rst b/libc/docs/dev/code_style_why.rst
index 431d1f93a7d40..758821376e690 100644
--- a/libc/docs/dev/code_style_why.rst
+++ b/libc/docs/dev/code_style_why.rst
@@ -177,15 +177,25 @@ LLVM-libc specific rules
options and tradeoffs is so large that creating a program (the compiler)
to perform the optimizations for us is much more efficient. There are
places where the compiler misses optimizations, to be sure, but many more
- where it finds optimizations that humans would never imagine. By fixing
- those problem and improving the compiler, it improves not just our code,
+ where it finds optimizations that humans could never imagine. By fixing
+ those problems and improving the compiler, it improves not just our code,
but everyone's code.
#. Mark all functions in headers with ``LIBC_INLINE``
* This is to avoid ODR violations, but also to allow adding properties to
the function.
- .. TODO: Fix this phrasing.
+ * Functions in headers effectively get "defined" by every file that includes
+ the header. This is a problem for C++'s One Definition Rule (ODR), which
+ says that only one copy of a function should exist. In practice, this means
+ link errors when linking two cpp files that include the same header. By
+ marking a function ``inline`` it tells the compiler to remove the
+ function's public definition and move it into the caller (inlining it).
+ This means the inlined function is gone by link time, so there's no
+ conflict.
+ * Normally marking functions in headers ``inline`` is sufficient, but there
+ are cases where we want to add attributes to the inline functions. The
+ macro lets us do that by redefining the ``LIBC_INLINE`` macro.
#. Use ``LIBC_ASSERT`` for runtime assertions.
>From c5ef3dbdd60e2ff19f91435720bd9ec112e1354d Mon Sep 17 00:00:00 2001
From: Michael Jones <michaelrj at google.com>
Date: Thu, 3 Apr 2025 16:32:41 -0700
Subject: [PATCH 3/3] finished first draft of why. Need to add rest of
code_style
---
libc/docs/dev/code_style_why.rst | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/libc/docs/dev/code_style_why.rst b/libc/docs/dev/code_style_why.rst
index 758821376e690..5e7dcf41f8921 100644
--- a/libc/docs/dev/code_style_why.rst
+++ b/libc/docs/dev/code_style_why.rst
@@ -201,6 +201,8 @@ LLVM-libc specific rules
* There are some files where this is impossible due to ``libc_assert.h``
depending on that file. These just can't use assertions.
+ * The ``LIBC_ASSERT`` interface allow using assertions without calling the
+ public ``assert`` function.
.. TODO: list which files can't use assertions. Also see if we can fix this.
#. Avoid allocating memory where possible.
@@ -209,6 +211,12 @@ LLVM-libc specific rules
must be large.
* Some functions require allocation, such as ``strdup``, these should
allocate as defined by the standard.
+ * This is more of an opinion than a hard rule. In general, libc functions can
+ be implemented without allocation. Doing so makes them behave more
+ predictably in unusual situations, such as on memory constrained systems or
+ during a program crash. For functions like printf this can be very
+ important, consider the importance of logging before a crashing program
+ exits.
#. Avoid calling the public name of libc functions from unit tests.
@@ -218,3 +226,6 @@ LLVM-libc specific rules
* For tests of public header macros or integration tests, it may be necessary
to call the public name of the function. These should be in the
``test/include`` directory or the ``test/integration`` directory.
+ * This ensures that the function being tested is always the one provided by
+ LLVM-libc. It's less useful for the test to confirm that the system's libc
+ is functional.
More information about the libc-commits
mailing list