[llvm] [BinaryFormat] Add "SFrame" structures and constants (PR #147264)

Pavel Labath via llvm-commits llvm-commits at lists.llvm.org
Tue Jul 8 02:08:43 PDT 2025


================
@@ -0,0 +1,98 @@
+//===- SFrameTest.cpp -----------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/BinaryFormat/SFrame.h"
+#include "gtest/gtest.h"
+
+using namespace llvm;
+using namespace llvm::sframe;
+
+namespace {
+// Test structure sizes and triviality.
+static_assert(std::is_trivial_v<sframe_preamble>);
+static_assert(sizeof(sframe_preamble) == 4);
----------------
labath wrote:

COFF and ELF parsers indeed use endian-specific integers, but they don't define these types inside the BinaryFormat library. What they do is instead is redefine those types inside the *Object* library (include/llvm/Object/{COFF,ELFTypes}.h). The types in BinaryFormat don't have asserts on their size, but it does look like they (intentionally or not) follow the on-disk layout. So this PR is sort of consistent with that.

That said, my plan was not to use these structures by `reinterpret_cast`ing the mmapped data. I was planning to use the DataExtractor (see dependent PR) to take care of the endianness when reading from the data (which can still be mmapped). I considered templatizing on the endianness, but I wanted to avoid that because of the additional wrapping needed when switching between generic and concrete code. I can do that, if desired, but this seemed like it was easier to do. This also means I technically don't need the structure layout to match the protocol, but I think it's a nice form of documentation at least.

Another reason is that the initial version of this file based on @Sterling-Augustine's prototype (a fact I neglected to mention). Looking at the generation code, I see that it also has no need for the layout or endianness of this structure, but that's not entirely the case in the linker (which needs to produce and consume the format). The code, at least in the current form, does look like it could benefit from endian specific integers, though I think we could find precedents for using DataExtractors as well (e.g., the debug_names linker, whose functionality is actually quite similar to this).

Circling back, I think the main question is what mechanism do we use for parsing. The options I see are:
1. endian-specific integers plus reinterpret_cast (used e.g. in ELF and COFF)
2. DataExtractor (used mainly in DWARF, which includes eh_frame)
3. memcpy + swapByteOrder (used in MachO and some other stuff)

I'd prefer the second option, but I'm happy to go with anything, particularly if it means the parser can be reused in the linker.

Assuming we go with the first option, the second question is whether we do the ELF thing of defining the structures in both BinaryFormat and Object libraries, or we just have a single version (where?).

https://github.com/llvm/llvm-project/pull/147264


More information about the llvm-commits mailing list