[llvm] [RISCV][docs] GP Relaxation and Small Data Limit (PR #108592)

Sam Elliott via llvm-commits llvm-commits at lists.llvm.org
Mon Sep 16 03:25:20 PDT 2024


https://github.com/lenary updated https://github.com/llvm/llvm-project/pull/108592

>From e4e8b163f920d1b962a910b5690e3afacee00ccd Mon Sep 17 00:00:00 2001
From: Sam Elliott <quic_aelliott at quicinc.com>
Date: Fri, 13 Sep 2024 08:49:35 -0700
Subject: [PATCH 1/4] [RISCV][docs] GP Relaxation and Small Data Limit

As discussed in this week's RISC-V sync-up, we said we would add
documentation about these options, and how they work.
---
 llvm/docs/RISCVUsage.rst | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/llvm/docs/RISCVUsage.rst b/llvm/docs/RISCVUsage.rst
index a15af9adfa945a..7ee1f2e10982a5 100644
--- a/llvm/docs/RISCVUsage.rst
+++ b/llvm/docs/RISCVUsage.rst
@@ -431,3 +431,26 @@ line.  This currently applies to the following extensions:
 * ``Zvksg``
 * ``Zvksh``
 * ``Zvkt``
+
+Global Pointer (GP) Relaxation and the Small Data Limit
+=======================================================
+
+Some of the RISC-V psABIs reserve ``gp`` (``x3``) for use as a "Global Pointer", to make generating data addresses more efficient.
+
+To use this functionality, you need to:
+* not be using the ``gp`` register for any other uses -- some platforms use it for other things;
+* compile your objects with Clang's ``-mrelax`` option, to enable relaxation annotations on relocatable objects; and
+* be compiling for an executable (not a shared library); and
+* use LLD's ``--relax-gp`` option.
+
+LLD will relax (rewrite) any code sequences that materialize an address within 2048 bytes of this ``__global_pointer$`` (which will be defined if it does not already exist) to instead generate the address using ``gp`` and the correct (signed) 12-bit immediate. This usually saves at least one instruction compared to materialising a full 32-bit address value.
+
+There can only be one ``__global_pointer$`` in a process (as ``gp`` is not changed when calling into a function in a shared library), so this optimisation is only done for executables, and not for shared libraries. Startup code is expected to put the value of ``__global_pointer$`` (from the executable) into ``gp`` before any user code is run.
+
+Arguably, the most efficient use for this addressing mode is for smaller global variables, as larger global variables are likely to need many more loads or stores when they are being accessed anyway.
+
+Therefore the compiler can do so, by placing smaller global variables into sections with with names starting ``.sdata`` or ``.sbss`` (matching sections with names starting ``.data`` and ``.bss`` respectively). LLD knows these sections should be laid out closer to the ``__global_pointer$`` symbol and adjacent to the ``.data`` section.
+
+Clang's ``-msmall-data-limit=`` option controls what the threshold size is (in bytes) for a global variable to be considered small. ``-msmall-data-limit=0`` disables the use of sections starting ``.sdata`` and ``.sbss``. The ``-msmall-data-limit=`` option will not move global variables that have an explicit data section, and will keep globals separate if using ``-fdata-sections``.
+
+Data suggests that these options can produce significant improvements across a range of benchmarks.

>From e46724f13e2443f14e820fa247e0c12d4647d274 Mon Sep 17 00:00:00 2001
From: Sam Elliott <quic_aelliott at quicinc.com>
Date: Fri, 13 Sep 2024 10:35:54 -0700
Subject: [PATCH 2/4] fixup! [RISCV][docs] GP Relaxation and Small Data Limit

---
 llvm/docs/RISCVUsage.rst | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/llvm/docs/RISCVUsage.rst b/llvm/docs/RISCVUsage.rst
index 7ee1f2e10982a5..64e279e92c98f0 100644
--- a/llvm/docs/RISCVUsage.rst
+++ b/llvm/docs/RISCVUsage.rst
@@ -435,22 +435,23 @@ line.  This currently applies to the following extensions:
 Global Pointer (GP) Relaxation and the Small Data Limit
 =======================================================
 
-Some of the RISC-V psABIs reserve ``gp`` (``x3``) for use as a "Global Pointer", to make generating data addresses more efficient.
+Some of the RISC-V psABI variants reserve ``gp`` (``x3``) for use as a "Global Pointer", to make generating data addresses more efficient.
 
-To use this functionality, you need to:
-* not be using the ``gp`` register for any other uses -- some platforms use it for other things;
-* compile your objects with Clang's ``-mrelax`` option, to enable relaxation annotations on relocatable objects; and
-* be compiling for an executable (not a shared library); and
-* use LLD's ``--relax-gp`` option.
+To use this functionality, you need to be doing all of the following:
+* Use the ``medlow`` (aka ``small``) code model;
+* Not use the ``gp`` register for any other uses (some platforms use it for other things);
+* Compile your objects with Clang's ``-mrelax`` option, to enable relaxation annotations on relocatable objects;
+* Compile for a position-dependent static executable (not a shared library, and ``-fno-PIC`` / ``-fno-pic`` / ``-fno-pie``); and
+* Use LLD's ``--relax-gp`` option.
 
-LLD will relax (rewrite) any code sequences that materialize an address within 2048 bytes of this ``__global_pointer$`` (which will be defined if it does not already exist) to instead generate the address using ``gp`` and the correct (signed) 12-bit immediate. This usually saves at least one instruction compared to materialising a full 32-bit address value.
+LLD will relax (rewrite) any code sequences that materialize an address within 2048 bytes of ``__global_pointer$`` (which will be defined if it does not already exist) to instead generate the address using ``gp`` and the correct (signed) 12-bit immediate. This usually saves at least one instruction compared to materialising a full 32-bit address value.
 
-There can only be one ``__global_pointer$`` in a process (as ``gp`` is not changed when calling into a function in a shared library), so this optimisation is only done for executables, and not for shared libraries. Startup code is expected to put the value of ``__global_pointer$`` (from the executable) into ``gp`` before any user code is run.
+There can only be one ``gp`` value in a process (as ``gp`` is not changed when calling into a function in a shared library), so the symbol is is only defined and this relaxation is only done for executables, and not for shared libraries. The linker expects executable startup code to put the value of ``__global_pointer$`` (from the executable) into ``gp`` before any user code is run.
 
 Arguably, the most efficient use for this addressing mode is for smaller global variables, as larger global variables are likely to need many more loads or stores when they are being accessed anyway.
 
-Therefore the compiler can do so, by placing smaller global variables into sections with with names starting ``.sdata`` or ``.sbss`` (matching sections with names starting ``.data`` and ``.bss`` respectively). LLD knows these sections should be laid out closer to the ``__global_pointer$`` symbol and adjacent to the ``.data`` section.
+Therefore the compiler can place smaller global variables into sections with with names starting ``.sdata`` or ``.sbss`` (matching sections with names starting ``.data`` and ``.bss`` respectively). LLD knows to define the ``global_pointer$`` symbol close to these sections, and to lay these sections out adjacent to the ``.data`` section.
 
-Clang's ``-msmall-data-limit=`` option controls what the threshold size is (in bytes) for a global variable to be considered small. ``-msmall-data-limit=0`` disables the use of sections starting ``.sdata`` and ``.sbss``. The ``-msmall-data-limit=`` option will not move global variables that have an explicit data section, and will keep globals separate if using ``-fdata-sections``.
+Clang's ``-msmall-data-limit=`` option controls what the threshold size is (in bytes) for a global variable to be considered small. ``-msmall-data-limit=0`` disables the use of sections starting ``.sdata`` and ``.sbss``. The ``-msmall-data-limit=`` option will not move global variables that have an explicit data section, and will keep globals in separate sections if you are using ``-fdata-sections``.
 
 Data suggests that these options can produce significant improvements across a range of benchmarks.

>From 4e391aad71d3db9db071d047bfe5721d0dccdcb0 Mon Sep 17 00:00:00 2001
From: Sam Elliott <quic_aelliott at quicinc.com>
Date: Fri, 13 Sep 2024 11:51:39 -0700
Subject: [PATCH 3/4] fixup! [RISCV][docs] GP Relaxation and Small Data Limit

---
 llvm/docs/RISCVUsage.rst | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/llvm/docs/RISCVUsage.rst b/llvm/docs/RISCVUsage.rst
index 64e279e92c98f0..b74860969f1225 100644
--- a/llvm/docs/RISCVUsage.rst
+++ b/llvm/docs/RISCVUsage.rst
@@ -439,12 +439,12 @@ Some of the RISC-V psABI variants reserve ``gp`` (``x3``) for use as a "Global P
 
 To use this functionality, you need to be doing all of the following:
 * Use the ``medlow`` (aka ``small``) code model;
-* Not use the ``gp`` register for any other uses (some platforms use it for other things);
+* Not use the ``gp`` register for any other uses (some platforms use it for the shadow stack and others as a temporary -- as denoted by the ``Tag_RISCV_x3_reg_usage`` build attribute);
 * Compile your objects with Clang's ``-mrelax`` option, to enable relaxation annotations on relocatable objects;
 * Compile for a position-dependent static executable (not a shared library, and ``-fno-PIC`` / ``-fno-pic`` / ``-fno-pie``); and
 * Use LLD's ``--relax-gp`` option.
 
-LLD will relax (rewrite) any code sequences that materialize an address within 2048 bytes of ``__global_pointer$`` (which will be defined if it does not already exist) to instead generate the address using ``gp`` and the correct (signed) 12-bit immediate. This usually saves at least one instruction compared to materialising a full 32-bit address value.
+LLD will relax (rewrite) any code sequences that materialize an address within 2048 bytes of ``__global_pointer$`` (which will be defined if it is used and does not already exist) to instead generate the address using ``gp`` and the correct (signed) 12-bit immediate. This usually saves at least one instruction compared to materialising a full 32-bit address value.
 
 There can only be one ``gp`` value in a process (as ``gp`` is not changed when calling into a function in a shared library), so the symbol is is only defined and this relaxation is only done for executables, and not for shared libraries. The linker expects executable startup code to put the value of ``__global_pointer$`` (from the executable) into ``gp`` before any user code is run.
 
@@ -454,4 +454,6 @@ Therefore the compiler can place smaller global variables into sections with wit
 
 Clang's ``-msmall-data-limit=`` option controls what the threshold size is (in bytes) for a global variable to be considered small. ``-msmall-data-limit=0`` disables the use of sections starting ``.sdata`` and ``.sbss``. The ``-msmall-data-limit=`` option will not move global variables that have an explicit data section, and will keep globals in separate sections if you are using ``-fdata-sections``.
 
+The small data limit threshold is also used to separate small constants into sections with names starting ``.srodata``. LLD does not place these with the ``.sdata`` and ``.sbss`` sections as ``.srodata`` sections are read only and the other two are writable. Instead the ``.srodata`` sections are placed adjacent to ``.rodata``.
+
 Data suggests that these options can produce significant improvements across a range of benchmarks.

>From cbed01edd89fa70cfd353a7c9b98e9ef42319af7 Mon Sep 17 00:00:00 2001
From: Sam Elliott <quic_aelliott at quicinc.com>
Date: Mon, 16 Sep 2024 03:25:05 -0700
Subject: [PATCH 4/4] fixup! [RISCV][docs] GP Relaxation and Small Data Limit

---
 llvm/docs/RISCVUsage.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/llvm/docs/RISCVUsage.rst b/llvm/docs/RISCVUsage.rst
index b74860969f1225..4fb2cc46f26de8 100644
--- a/llvm/docs/RISCVUsage.rst
+++ b/llvm/docs/RISCVUsage.rst
@@ -448,12 +448,12 @@ LLD will relax (rewrite) any code sequences that materialize an address within 2
 
 There can only be one ``gp`` value in a process (as ``gp`` is not changed when calling into a function in a shared library), so the symbol is is only defined and this relaxation is only done for executables, and not for shared libraries. The linker expects executable startup code to put the value of ``__global_pointer$`` (from the executable) into ``gp`` before any user code is run.
 
-Arguably, the most efficient use for this addressing mode is for smaller global variables, as larger global variables are likely to need many more loads or stores when they are being accessed anyway.
+Arguably, the most efficient use for this addressing mode is for smaller global variables, as larger global variables likely need many more loads or stores when they are being accessed anyway, so the cost of materializing the upper bits can be shared.
 
-Therefore the compiler can place smaller global variables into sections with with names starting ``.sdata`` or ``.sbss`` (matching sections with names starting ``.data`` and ``.bss`` respectively). LLD knows to define the ``global_pointer$`` symbol close to these sections, and to lay these sections out adjacent to the ``.data`` section.
+Therefore the compiler can place smaller global variables into sections with names starting with ``.sdata`` or ``.sbss`` (matching sections with names starting with ``.data`` and ``.bss`` respectively). LLD knows to define the ``global_pointer$`` symbol close to these sections, and to lay these sections out adjacent to the ``.data`` section.
 
 Clang's ``-msmall-data-limit=`` option controls what the threshold size is (in bytes) for a global variable to be considered small. ``-msmall-data-limit=0`` disables the use of sections starting ``.sdata`` and ``.sbss``. The ``-msmall-data-limit=`` option will not move global variables that have an explicit data section, and will keep globals in separate sections if you are using ``-fdata-sections``.
 
-The small data limit threshold is also used to separate small constants into sections with names starting ``.srodata``. LLD does not place these with the ``.sdata`` and ``.sbss`` sections as ``.srodata`` sections are read only and the other two are writable. Instead the ``.srodata`` sections are placed adjacent to ``.rodata``.
+The small data limit threshold is also used to separate small constants into sections with names starting with ``.srodata``. LLD does not place these with the ``.sdata`` and ``.sbss`` sections as ``.srodata`` sections are read only and the other two are writable. Instead the ``.srodata`` sections are placed adjacent to ``.rodata``.
 
 Data suggests that these options can produce significant improvements across a range of benchmarks.



More information about the llvm-commits mailing list