<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/97937>97937</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [BOLT] Optimizing immediate relocation with addends produces incorrect assembly
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            BOLT
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          Lawqup
      </td>
    </tr>
</table>

<pre>
    I ran into an issue running BOLT on a program that accesses a global struct field provided by the linker. When accessing a field of that struct, the optimized binary wipes the addend and instead replaces reads of that field with reads of the struct+0.

## Versions of things

I tested with both x86 and aarch64 and managed to reproduce.

```sh
$ llvm-bolt --version
LLVM (http://llvm.org/):
  LLVM version 19.0.0git
 Optimized build with assertions.
BOLT revision bb6a4850553dd4140a5bd63187ec1b14d0b731f9
```

```sh
$ gcc --version 
gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-17)
```

## Repro
This is a minimal repro on x86.

The following is the C, `print_global.c`
```c
#include <stdio.h>
#include <stdint.h>

struct build_id_note {
  char pad[16];
  uint8_t hash[20];
};

extern const struct build_id_note build_id_note;

void print_build_id()
{
        int x;

        for (x = 0; x < 20; x++) {
                printf("%02hhx", build_id_note.hash[x]);
        }
        printf("\n");
}

int main() {
        print_build_id();
        return 0;
}
```

I used the default linker script outputted by the `-Wl,-verbose` gcc flag, then inserted the `build_id_note` at an absolute location right under that first `PROVIDE` in `build_id.ld`, like so:

```ld
/* first bit of default linker script removed */
SECTIONS
{
  /* Read-only sections, merged into text segment: */
  PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;
  .note.gnu.build-id (0x400200):
   {
 build_id_note = ABSOLUTE(.);
    *(.note.gnu.build-id)
   }
  /* rest of default linker script removed */
}
```

This is compiled with
```sh
gcc -o print_global print_global.c -Wl,-T,build_id.ld -Wl,--emit-relocs -Wl,--build-id=sha1
```

Then bolting:
```sh
$ llvm-bolt ./print_global -o print_global.bolted --funcs print_build_id         
BOLT-INFO: Target architecture: x86_64
BOLT-INFO: BOLT version: bb6a4850553dd4140a5bd63187ec1b14d0b731f9
BOLT-INFO: first alloc address is 0x200000
BOLT-INFO: creating new program header table at address 0x800000, offset 0x600000
BOLT-INFO: enabling relocation mode
BOLT-INFO: enabling -align-macro-fusion=all since no profile was specified
BOLT-INFO: enabling lite mode
BOLT-INFO: 0 out of 12 functions in the binary (0.0%) have non-empty execution profile
BOLT-INFO: setting _end to 0xa001cc
BOLT-INFO: patched build-id (flipped last bit)
```

The output of the non-bolted binary is:
```sh
$ ./print_global
74007820799e8dc7eb6ff742fe419c5866842b5b
```

Yet, the bolted binary produces the wrong result
```
$ ./print_global.bolted 
040000001400000003000000474e550074007820
```

Of course, the build id is still as expected for the bolted binary:
```sh
$ readelf -n print_global.bolted 
Displaying notes found in: .note.gnu.build-id
 Owner                 Data size       Description
  GNU 0x00000014       NT_GNU_BUILD_ID (unique build ID bitstring)
    Build ID: 74007820799e8dc7eb6ff742fe419c5866842b5a
```

So this is a case where BOLT leads to incorrect runtime behavior. Inspecting the assembly, we see the root of the problem is that the immediate address for `build_id_note.hash` is missing the addend and thus we print starting from the `pad` field:

Non-bolted asm:
```sh
$ gdb -batch -ex "disassemble print_build_id" print_global                
Dump of assembler code for function print_build_id:
   0x00000000004004e7 <+0>: push   %rbp
   0x00000000004004e8 <+1>:     mov    %rsp,%rbp
 0x00000000004004eb <+4>:     sub    $0x10,%rsp
   0x00000000004004ef <+8>: movl   $0x0,-0x4(%rbp)
   0x00000000004004f6 <+15>:    jmp    0x40051c <print_build_id+53>
   0x00000000004004f8 <+17>:    mov -0x4(%rbp),%eax
   0x00000000004004fb <+20>:    cltq   
 0x00000000004004fd <+22>:    movzbl 0x400210(%rax),%eax <---------- Access of build_id_note.hash
   0x0000000000400504 <+29>:    movzbl %al,%eax
 0x0000000000400507 <+32>:    mov    %eax,%esi
   0x0000000000400509 <+34>: mov    $0x4005d0,%edi
   0x000000000040050e <+39>:    mov $0x0,%eax
   0x0000000000400513 <+44>:    callq  0x400410 <printf@plt>
 0x0000000000400518 <+49>:    addl   $0x1,-0x4(%rbp)
 0x000000000040051c <+53>:    cmpl   $0x13,-0x4(%rbp)
   0x0000000000400520 <+57>:    jle    0x4004f8 <print_build_id+17>
   0x0000000000400522 <+59>: mov    $0xa,%edi
   0x0000000000400527 <+64>:    callq  0x400400 <putchar@plt>
   0x000000000040052c <+69>:    nop
   0x000000000040052d <+70>:    leaveq 
   0x000000000040052e <+71>:    retq   
End of assembler dump.

```

Bolted asm:
```sh
$ gdb -batch -ex "disassemble print_build_id" print_global.bolted 
Dump of assembler code for function print_build_id:
 0x0000000000a00000 <+0>:     push   %rbp
   0x0000000000a00001 <+1>: mov    %rsp,%rbp
   0x0000000000a00004 <+4>:     sub    $0x10,%rsp
 0x0000000000a00008 <+8>:     movl   $0x0,-0x4(%rbp)
   0x0000000000a0000f <+15>:    jmp    0xa00035 <print_build_id+53>
   0x0000000000a00011 <+17>: mov    -0x4(%rbp),%eax
   0x0000000000a00014 <+20>:    cltq   
 0x0000000000a00016 <+22>:    movzbl 0x400200(%rax),%eax <---------- Uh oh, should be 0x400210
   0x0000000000a0001d <+29>:    movzbl %al,%eax
 0x0000000000a00020 <+32>:    mov    %eax,%esi
   0x0000000000a00022 <+34>: mov    $0x4005d0,%edi
   0x0000000000a00027 <+39>:    mov $0x0,%eax
   0x0000000000a0002c <+44>:    callq  0x400410 <printf@plt>
 0x0000000000a00031 <+49>:    addl   $0x1,-0x4(%rbp)
 0x0000000000a00035 <+53>:    cmpl   $0x13,-0x4(%rbp)
   0x0000000000a00039 <+57>:    jle    0xa00011 <print_build_id+17>
   0x0000000000a0003b <+59>: mov    $0xa,%edi
   0x0000000000a00040 <+64>:    callq  0x400400 <putchar@plt>
   0x0000000000a00045 <+69>:    leaveq 
 0x0000000000a00046 <+70>:    retq   
End of assembler dump.

```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy8Wdty4ji3fhrnZhWULJ_gIhcBkv5T1dPZ1UnP1N43lGwvsGZkyy3JCZmn3yXZBmMOPZnu-qkUEFvr0zp86yDMtObbCvHWixZetLphjSmkuv3M3r439U0q8_fbR1CsAl4ZCfZT6wZBNVXFqy0snj6_gKyAQa3kVrESTMEMsCxDrVEDg62QKROgjWoyAxuOIrdrX3mOOaTvYAoEwau_UE3hjwKrTtaCs2653LSoLYZHl05I1oaX_G-Lwium3uGN16jdLZbnWOXAqhx4pQ2yHBTWgmWoQSHL9R6y3eCNm2J4A_dbLcjUIyuP3HXvNPBoAL-j0lxW3WpebfVw0SMY1AY72FSaAnaz2GnDmMqKOHTfS1axLeZgpFVOybzJ8HizmLR_uuh3D0GI13KSSmFgMnlt1Whvfv78-2_g0VlhTO0Fdx598OiDXT2Vauv-m9vLbi2AW93Jgz-fkinZctPdfTp4tuG9e5jWqIy1ulPShV7hK3cYaRqzcBaRKAryPPRDwqI0jwN_lmDmp36YkzQJ_M18ZNoP7d1m2cFSaK_bax6dfVouPTqHZBpMfaDEn5HEp_bGV8zhP8y0dyZ-Yk2_sm0b0682Bu2Vl4Jr4Ja8Ja94yUQbIMvz3Sw-itFLgbCRQsg3S1je0m9pKerFpFa8Mus2AabZYdtei2yvAK8y0eQIXrDUJudyWnjB_aW7lRnedu9ddrlwrXm-rqRB8JJFH-6sYApqlnvRwo-9aOUF-1sNr8xsbaBguvCiBSXD214y-O7ecWdQVZDJSvcJOdr26L-R9KvkNvutW_plHp0d4pPsV895ZWA3EvfIfCOVDfEOvGAFxAsWYL8ugbbfPbpwf3MYYnlk7jbduM2oRyNCi2Lnvi6PFZ52fthZN9iUGaAkq95p7esIM1pWDm9-5LyB8tagkvGqNXgYnQHa2C3BaJFC06jKGT7e5By3H6HRtsQUCDluWCNMV2tBZ4rXBmRj6saYQyn2YjL5Q3h0aXMulRq9mLgk3Ai27UqvbQa2GHTIXkyOYx4TsD2gApZqKRqDIGTGbO0AxbeFgabKUfUFWGljIf7n69Pvj6t7K8yrIeZU5NYqugTB_0LQcl_GRoaLvM-YB4_edcgpN7ZKn7deYSlfMQePOiEn_Xy_fHl8-vI8YiRAB_sVWT6RlXgHjZmrh1a1EpWt5a5JGtwZ0LgtsTJecHeEDtDZaUm8XuMOs8awVOBaG6aMY_Xz_aff7r-8rJ9f7r6-tOyykJMesmUt2YXEvixLHFFg-u-kwaMLeH78v_unh_V_7u9W91-fB7SbuqTYVs3UxWPCrbdmTpy2mx-ayoDTo0oUrOBu8fz0-dvLvUdn0zGznYdmp1vtC4PDXo0ioVB_MLbXk6Uv-5ksay66_n2hNbm2JGFY4eG43EOXRi8eXQ6o3F-eYMnNRKGQmd5f29sdrHTB_Ku6YgV2DODV9pAP1weGqUcfjhQeGTC1qzCHyWTTVJkeFaS-AsGh-08evzw8WYa_MLVFA3a44QYz0yi0l3ezeB2HZ9a70aGfXoK7Dw4PR1BtljMhZGbHPoXaxZDsaMvvU4lMIbNugwrf9lNrgcyVJJuKrnh1UGQ36_JkCXKz0WiA7OJL0FixVFhoF9e24pUyx2tLJ0zwbTUpWabkZNO0LlkxIUDzKkOobJDkhguEN6ZB15jxDcf8GqbgBi9tTGzNt2njU7BxdiXMllxbyrtJ2mb4lHg0suWhYK9Wi2qCZW3eoS1Z1rJOrTN7aDTOw2s7hRsJZMcI8bPszNKamazoJ82uumwEr2vMQbC2gF-f3-wI1raxfni3ynZc7gzi-nqSjFOjvZOEhCQzSpL5HGd5lmAabzZJSDcY-vMsmsXxLKRplF5R7n9xf1451qgb-NuB8U1JRxrdCHMe7IyOfbq2K0hbz4nffZKg_QiTEKOIkN6WK7o-bSCTjdK419hN_zy3CaUNFwKYBtzVmNl97SR2Ytd1N9sjFooNTKqzhadduOK6FuzdZag0qGEjG3eQs3Q50yO6M8tbhQrGrxUzDDT_G_v_se0O-1MTwKcv34Dseud16768rD99-bZefHv8vFo_riwpm4p_b3qfPK4sL7VRtvwOmhQsuttW139IH3YlJM_Sni-7w0jGNMJbgQrbAircgdVI4FUmlcLM2GO54SVCigV75VJN4bGyFcNlozsZa41lKt5tiN8QNKK7rKTcZ0-tZCqwbA8zzLhrvCwx58zgvjC6OXw0-LWzsx3gNJS8PcOPjuOmaLTd2IUf3MxjV22ULPtZ0h5TYtKezEej3pdDYjNdXufaNk9hktrqAhPcgUdpznVnPp4M2_S4lY9eHTGbsrZO6kEUZDJH54m-kI5xB7NRzzGXlISEmNhzi0cXxJ7lbCVsdOFmoUil9WWxWSfmd2L2VcpX6ER17dHlEcYJQtohhAME3aQtQkh2Pukg9BU1Nh3IrAMp5avoAaz8hOxCN306TQYpMkbaxL1B0UGfP8sa3NqQkMjP7IpxxBZRsD8En4Pd-yk5wFo_nSjmbEW2uwzVO4ySA1QmzPc9M05F8l6EHu3-dypao6h1slOC7YZKWLHJ_gV37rcwS7oziXZB34iE_ebzk809GjExNvkEoKdmcKx9xzEr2CJoflmJeY8RHhiyp5hdkXc0w_wKCvYox6YceHY9dpEf9GQfsD1jQnzv2BX6ZM-ujReSWpgDrU7QelKFA3VYnu-Z719m_glW1mG1PO40K-sDVvCBNIoo6eEGfP9T4D6Nuow4SaM2Py7C0h52fiaM7McRpD2X4ksBIG0AGpMVTI0jcAaw91s8iEElL5aqiPa5mAzSVyB7xe9wWagnXjKoswoHOX9f5cfNIG_K-vzvt8OLi_9C-xrNUz_RtoZuYe591LTcD1c_bFxO1B81rqtN64x8-OG2dQIxGzWtrpp8vHE5tM21xmVXBNEHG5cV8v1x4-oc9bHGxdpx9iONy4nEP2pc5B81rm8FyMIOmbqQjcghxUPfu6hu_hN9ywLsa-C_7FsOg_5s33IoyU_0LQeQ_aq-5Yjo_5q-dSD1L-lbDm5-rW8dEuIDfcvBpj_Rt5hrAr-wbznA6EzfOm5DJyLxmc71b5vQTX4b5PNgzm7w1k8oCWgSJfFNcZtFcZRklAY0DvL5PA2DgAYzDKKA-gwD_4bfUkJDkpCEhGHsx9MI50E4T0kWZiSJZtQLCZaMi2n_uPHGPSS-nSfzILkRLEWh3fNlSu3B1T0tWd2o2_anyWarvZAIro0-ABhuhHsm7QSiVf9M0j1l2x9GBz-ztc8o3UFTH35cOZyM-4PvTaPEbWFMrfcPSbfcFE06zWTZPTHtPia1kn9iZjz64OzRHn1oTXq9pf8fAAD__z2Djnw">