[llvm] [BOLT][RISCV] Handle CIE's produced by GNU as (PR #69578)

Job Noorman via llvm-commits llvm-commits at lists.llvm.org
Thu Oct 19 02:37:10 PDT 2023


https://github.com/mtvec created https://github.com/llvm/llvm-project/pull/69578

On RISC-V, GNU as produces the following initial instruction in CIE's:

```
DW_CFA_def_cfa_register: r2
```

While I believe it is technically illegal to use this instruction without first using a `DW_CFA_def_cfa` (since the offset is undefined), both `readelf` and `llvm-dwarfdump` accept this and implicitly set the offset to 0.

In BOLT, however, this triggers an assert (in `CFISnapshot::advanceTo`) as it (correctly) believes the offset is not set. This patch fixes this by setting the offset to 0 whenever executing `DW_CFA_def_cfa_register` while the offset is undefined.

Note that this is probably the simplest workaround but it has a downside: while emitting CFI start, we check if the initial instructions are contained within `MCAsmInfo::getInitialFrameState` and omit them if they are. This will not be true for GNU CIE's (since they differ from LLVM's) which causes an unnecessary `DW_CFA_def_cfa_register` to be emitted.

While technically correct, it would probably be better to replace the GNU CIE with the one used by LLVM to avoid this situation. This would solve the problem this patch solves while also preventing unnecessary CFI instructions. However, this is a bit trickier to implement correctly so I propose to keep this for a later time.

Note on testing: the test creates a simple function with three basic blocks and forces the CFI state of the last one to be different from the others using an arbitrary CFI instruction. Then,
`--reorder-blocks=reverse` is used to force `CFISnapshot::advanceTo` to be called. This causes an assert on the current main branch.

>From 450bb12d8492245e25b4216f91bef6090afd49f2 Mon Sep 17 00:00:00 2001
From: Job Noorman <jnoorman at igalia.com>
Date: Thu, 19 Oct 2023 11:31:49 +0200
Subject: [PATCH] [BOLT][RISCV] Handle CIE's produced by GNU as

On RISC-V, GNU as produces the following initial instruction in CIE's:

```
DW_CFA_def_cfa_register: r2
```

While I believe it is technically illegal to use this instruction
without first using a `DW_CFA_def_cfa` (since the offset is undefined),
both `readelf` and `llvm-dwarfdump` accept this and implicitly set the
offset to 0.

In BOLT, however, this triggers an assert (in `CFISnapshot::advanceTo`)
as it (correctly) believes the offset is not set. This patch fixes this
by setting the offset to 0 whenever executing `DW_CFA_def_cfa_register`
while the offset is undefined.

Note that this is probably the simplest workaround but it has a
downside: while emitting CFI start, we check if the initial instructions
are contained within `MCAsmInfo::getInitialFrameState` and omit them if
they are. This will not be true for GNU CIE's (since they differ from
LLVM's) which causes an unnecessary `DW_CFA_def_cfa_register` to be
emitted.

While technically correct, it would probably be better to replace the
GNU CIE with the one used by LLVM to avoid this situation. This would
solve the problem this patch solves while also preventing unnecessary
CFI instructions. However, this is a bit trickier to implement correctly
so I propose to keep this for a later time.

Note on testing: the test creates a simple function with three basic
blocks and forces the CFI state of the last one to be different from the
others using an arbitrary CFI instruction. Then,
`--reorder-block=reverse` is used to force `CFISnapshot::advanceTo` to
be called. This causes an assert on the current main branch.
---
 bolt/lib/Core/BinaryFunction.cpp    |   7 ++
 bolt/test/RISCV/Inputs/cie-gnu.yaml | 109 ++++++++++++++++++++++++++++
 bolt/test/RISCV/cie-gnu.test        |  17 +++++
 3 files changed, 133 insertions(+)
 create mode 100644 bolt/test/RISCV/Inputs/cie-gnu.yaml
 create mode 100644 bolt/test/RISCV/cie-gnu.test

diff --git a/bolt/lib/Core/BinaryFunction.cpp b/bolt/lib/Core/BinaryFunction.cpp
index c196ab41e2a4451..65ebb8599ff75c4 100644
--- a/bolt/lib/Core/BinaryFunction.cpp
+++ b/bolt/lib/Core/BinaryFunction.cpp
@@ -2462,6 +2462,13 @@ struct CFISnapshot {
     case MCCFIInstruction::OpDefCfaRegister:
       CFAReg = Instr.getRegister();
       CFARule = UNKNOWN;
+
+      // This shouldn't happen according to the spec but GNU binutils on RISC-V
+      // emits a DW_CFA_def_cfa_register in CIE's which leaves the offset
+      // unspecified. Both readelf and llvm-dwarfdump interpret the offset as 0
+      // in this case so let's do the same.
+      if (CFAOffset == UNKNOWN)
+        CFAOffset = 0;
       break;
     case MCCFIInstruction::OpDefCfaOffset:
       CFAOffset = Instr.getOffset();
diff --git a/bolt/test/RISCV/Inputs/cie-gnu.yaml b/bolt/test/RISCV/Inputs/cie-gnu.yaml
new file mode 100644
index 000000000000000..7e05636ef86ab2a
--- /dev/null
+++ b/bolt/test/RISCV/Inputs/cie-gnu.yaml
@@ -0,0 +1,109 @@
+## Compiled and stripped-down version of:
+## (riscv64-linux-gnu-gcc -nostdlib -static -Wl,-q cie-gnu.s)
+#     .text
+#     .globl _start
+#     .type _start, @function
+# _start:
+#     .cfi_startproc
+#     beq a0, a1, 1f
+#     ret
+# 1:
+#     .cfi_undefined t0 # Arbitrary cfi command to force a new state
+#     ret
+#     .cfi_endproc
+#     .size _start, .-_start
+
+--- !ELF
+FileHeader:
+  Class:           ELFCLASS64
+  Data:            ELFDATA2LSB
+  Type:            ET_EXEC
+  Machine:         EM_RISCV
+  Flags:           [ EF_RISCV_RVC, EF_RISCV_FLOAT_ABI_DOUBLE ]
+  Entry:           0x10144
+ProgramHeaders:
+  - Type:            PT_LOAD
+    Flags:           [ PF_X, PF_R ]
+    FirstSec:        .text
+    LastSec:         .eh_frame
+    VAddr:           0x10000
+    Align:           0x1000
+    Offset:          0x0
+Sections:
+  - Name:            .text
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_ALLOC, SHF_EXECINSTR ]
+    Address:         0x10144
+    AddressAlign:    0x2
+    Offset:          0x144
+    Content:         6303B50082808280
+  - Name:            .eh_frame
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_ALLOC ]
+    Address:         0x10150
+    AddressAlign:    0x8
+    Content:         1000000000000000037A5200017C01011B0D02001000000018000000D8FFFFFF0800000000460705
+  - Name:            .rela.text
+    Type:            SHT_RELA
+    Flags:           [ SHF_INFO_LINK ]
+    Link:            .symtab
+    AddressAlign:    0x8
+    Info:            .text
+    Relocations:
+      - Offset:          0x10144
+        Symbol:          ".L1\x021"
+        Type:            R_RISCV_BRANCH
+  - Name:            .rela.eh_frame
+    Type:            SHT_RELA
+    Flags:           [ SHF_INFO_LINK ]
+    Link:            .symtab
+    AddressAlign:    0x8
+    Info:            .eh_frame
+    Relocations:
+      - Offset:          0x1016C
+        Symbol:          '.L0 '
+        Type:            R_RISCV_32_PCREL
+      - Offset:          0x10170
+        Symbol:          '.L0  (1)'
+        Type:            R_RISCV_ADD32
+      - Offset:          0x10170
+        Symbol:          '.L0 '
+        Type:            R_RISCV_SUB32
+      - Offset:          0x10175
+        Symbol:          '.L0  (2)'
+        Type:            R_RISCV_SET6
+      - Offset:          0x10175
+        Symbol:          '.L0 '
+        Type:            R_RISCV_SUB6
+  - Type:            SectionHeaderTable
+    Sections:
+      - Name:            .text
+      - Name:            .rela.text
+      - Name:            .eh_frame
+      - Name:            .rela.eh_frame
+      - Name:            .symtab
+      - Name:            .strtab
+      - Name:            .shstrtab
+Symbols:
+  - Name:            '$x'
+    Section:         .text
+    Value:           0x10144
+  - Name:            ".L1\x021"
+    Section:         .text
+    Value:           0x1014A
+  - Name:            '.L0 '
+    Section:         .text
+    Value:           0x10144
+  - Name:            '.L0  (1)'
+    Section:         .text
+    Value:           0x1014C
+  - Name:            '.L0  (2)'
+    Section:         .text
+    Value:           0x1014A
+  - Name:            _start
+    Type:            STT_FUNC
+    Section:         .text
+    Binding:         STB_GLOBAL
+    Value:           0x10144
+    Size:            0x8
+...
diff --git a/bolt/test/RISCV/cie-gnu.test b/bolt/test/RISCV/cie-gnu.test
new file mode 100644
index 000000000000000..deea8027ef3d608
--- /dev/null
+++ b/bolt/test/RISCV/cie-gnu.test
@@ -0,0 +1,17 @@
+# Test that BOLT can handle CIE's produced by GNU as. On RISC-V, GNU as produces
+# the following initial instruction:
+# DW_CFA_def_cfa_register: r2
+# While I believe it is technically incorrect to use this instruction without
+# first using a DW_CFA_def_cfa (since the offset is unspecified), both readelf
+# and llvm-dwarfdump accept this and implicitly set the offset to 0.
+# In BOLT, this used to trigger an assert, however, since it (correctly)
+# believed the offset was not set. This test checks we can handle this
+# situation.
+
+RUN: yaml2obj -o %t %p/Inputs/cie-gnu.yaml
+RUN: llvm-bolt -o %t.bolt %t --reorder-blocks=reverse
+RUN: llvm-dwarfdump --debug-frame %t.bolt | FileCheck %s
+
+CHECK: 0x400000: CFA=X2
+CHECK: 0x400004: CFA=X2: X5=undefined
+CHECK: 0x400006: CFA=X2



More information about the llvm-commits mailing list