[PATCH] D72194: [MC][ELF] Ensure that mergeable globals with an explicit section are assigned to SHF_MERGE sections with compatible entsizes

ben via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Mar 30 19:08:48 PDT 2020


bd1976llvm marked 30 inline comments as done.
bd1976llvm added inline comments.


================
Comment at: llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp:723
+  if (getContext().getAsmInfo()->useIntegratedAssembler()) {
+    // Mergeable symbols should always be placed into mergeable sections due to
+    // the UniqueID code above.
----------------
MaskRay wrote:
> The two assert may not be that useful. They can probably be deleted.
These mirror the assert for associated symbols. If these are not useful then that can probably go as well?


================
Comment at: llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp:743
+        (Section->getEntrySize() != getEntrySizeForKind(Kind)))
+      report_fatal_error(
+          "Symbol '" + GO->getName() + "' from module '" +
----------------
MaskRay wrote:
> nickdesaulniers wrote:
> > If we have a way of producing a non-fatal warning, that may be preferred to crashing the compiler, which encourages users to send us a bug report. In this case, we don't care about their buggy code with impossible to solve constraints, and the text you provide here should be enough information for them to solve their code.
> > 
> > Maybe a warning, but proceed with codegen? The warning can encourage them that referencing those variables is undefined behavior.
> This code path is possible, which means we should use MCContext::reportError rather than report_fatal_error. In a bugfree world, report_fatal_error should not ever be called.
report_fatal_error is used in similar circumstances in this file (which is why I chose it). See for example:

```
TargetLoweringObjectFileMachO::getExplicitSectionGlobal(
...
  if (!ErrorCode.empty()) {
    // If invalid, report the error with report_fatal_error.
    report_fatal_error("Global variable '" + GO->getName() +
                       "' has an invalid section specifier '" +
                       GO->getSection() + "': " + ErrorCode + ".");

```

In this file there are no current uses of reportError everything uses report_fatal_error so I wonder if what report_fatal_error -> MCContext::reportError  is a more general refactor that should be done in a separate change?

I tried using MCContext::reportError but it didn't produce any output because llc does not seem to setup a SrcMgr correctly to allow for warning and errors. I have been unable to determine where to modify llc to setup a SrcMgr correctly today. I will look at this problem again tomorrow.


================
Comment at: llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp:743
+        (Section->getEntrySize() != getEntrySizeForKind(Kind)))
+      report_fatal_error(
+          "Symbol '" + GO->getName() + "' from module '" +
----------------
bd1976llvm wrote:
> MaskRay wrote:
> > nickdesaulniers wrote:
> > > If we have a way of producing a non-fatal warning, that may be preferred to crashing the compiler, which encourages users to send us a bug report. In this case, we don't care about their buggy code with impossible to solve constraints, and the text you provide here should be enough information for them to solve their code.
> > > 
> > > Maybe a warning, but proceed with codegen? The warning can encourage them that referencing those variables is undefined behavior.
> > This code path is possible, which means we should use MCContext::reportError rather than report_fatal_error. In a bugfree world, report_fatal_error should not ever be called.
> report_fatal_error is used in similar circumstances in this file (which is why I chose it). See for example:
> 
> ```
> TargetLoweringObjectFileMachO::getExplicitSectionGlobal(
> ...
>   if (!ErrorCode.empty()) {
>     // If invalid, report the error with report_fatal_error.
>     report_fatal_error("Global variable '" + GO->getName() +
>                        "' has an invalid section specifier '" +
>                        GO->getSection() + "': " + ErrorCode + ".");
> 
> ```
> 
> In this file there are no current uses of reportError everything uses report_fatal_error so I wonder if what report_fatal_error -> MCContext::reportError  is a more general refactor that should be done in a separate change?
> 
> I tried using MCContext::reportError but it didn't produce any output because llc does not seem to setup a SrcMgr correctly to allow for warning and errors. I have been unable to determine where to modify llc to setup a SrcMgr correctly today. I will look at this problem again tomorrow.
I think a warning is appropriate as the codegen is correct if the linker does not optimise the mergeable sections. Unfortunately, I can't get a warning to be emitted because llc does not seem to setup a SrcMgr correctly to allow for warning and errors. I have been unable to determine where to modify llc to setup a SrcMgr correctly. I will look at this problem again tomorrow.


================
Comment at: llvm/test/CodeGen/X86/explict-section-mergeable.ll:1
+; RUN: llc < %s -mtriple=x86_64-linux-gnu -unique-section-names=0 -data-sections 2>&1 \
+; RUN:     | FileCheck %s
----------------
MaskRay wrote:
> MaskRay wrote:
> > Nit: `-linux-gnu` can be deleted.
> I just realized the file name made a typo. It should be `explicit`
Thanks! I'll make this change at the last moment because I think that a rename might mess up the phabricator comments?


================
Comment at: llvm/test/CodeGen/X86/explict-section-mergeable.ll:46
+;; Assign a compatible mergeable global to the previous section.
+ at explicit_basic_2 = unnamed_addr constant [2 x i16] [i16 1, i16 1], section ".explicit_basic"
+
----------------
MaskRay wrote:
> Changing 2 x i16 to 1 x i32 can enhance the test.
I think that it is more obvious to a reader that 2 x i16 -> .rodata.cst4 and 2 x i32 -> .rodata.cst8, no?


================
Comment at: llvm/test/CodeGen/X86/explict-section-mergeable.ll:62
+;; Assign a symbol with an incompatible entsize (string vs non-string) to a section with the same name.
+ at explicit_basic_5 = unnamed_addr constant [2 x i32] [i32 1, i32 0], section ".explicit_basic"
+;; Assign a compatible mergeable global to the previous section.
----------------
MaskRay wrote:
> explicit_basic_5 does not test string vs non-string.
explicit_basic_5 and explicit_basic_6 are both treated as strings which is why they get put into ".section .explicit_basic,"aMS", at progbits,4,unique,5" rather than ".section .explicit_basic,"aM", at progbits,8,unique,4".

This was part of the original internal bug report that led me to look at this problem originally.

The reader of the test should be able to see from the testing of the implicit section assignment (before any testing of explicitly assigned globals) how each symbol is treated by MC. For example near the top of the test we have:

```
;; Test implicit section assignment for symbols to ensure that the symbols
;; have the expected properties.
; CHECK: .section .rodata,"a", at progbits,unique,2
; CHECK: implicit_nonmergeable:
; CHECK: .section .rodata.cst4,"aM", at progbits,4
; CHECK: implicit_rodata_cst4:
; CHECK: .section .rodata.cst8,"aM", at progbits,8
; CHECK: implicit_rodata_cst8:
; CHECK: .section .rodata.str4.4,"aMS", at progbits,4
; CHECK: implicit_rodata_str4_4:

@implicit_nonmergeable  =              constant [2 x i16] [i16 1, i16 1]
@implicit_rodata_cst4   = unnamed_addr constant [2 x i16] [i16 1, i16 1]
@implicit_rodata_cst8   = unnamed_addr constant [2 x i32] [i32 1, i32 1]
@implicit_rodata_str4_4 = unnamed_addr constant [2 x i32] [i32 1, i32 0]
```

The test is careful to stick to symbol definitions that are as similar as possible, for readability.


================
Comment at: llvm/test/CodeGen/X86/explict-section-mergeable.ll:64
+;; Assign a compatible mergeable global to the previous section.
+ at explicit_basic_6 = unnamed_addr constant [2 x i32] [i32 1, i32 0], section ".explicit_basic"
+
----------------
MaskRay wrote:
> Change to 1 x i64 to enhance the test.
It won't be considered as a string if I make this change.


================
Comment at: llvm/test/CodeGen/X86/explict-section-mergeable.ll:70
+;; Assign a symbol with an incompatible entsize (non-mergeable) to a mergeable section created explicitly.
+ at explicit_basic_7 = constant [2 x i16] [i16 1, i16 1], section ".explicit_basic"
+
----------------
MaskRay wrote:
> This can be deleted.
> 
> If you want to test a non-unnamed_addr constant, this should be 2 x i32 and the comment should be fixed.
This is testing that a non-mergeable constant is assigned to the "generic" .explicit_basic section when sections with that name have already been created by the explicit assignment of mergeable symbols to that section name. "constant [2 x i16] [i16 1, i16 1]" is established to be non-mergeable by the test for the implicit assignment of @implicit_nonmergeable earlier. I think it aids readability that the definition of non-mergeable global @explicit_basic_7 is very similar to the definition of @explicit_basic_1. No?


================
Comment at: llvm/test/CodeGen/X86/explict-section-mergeable.ll:108
+;; Non-allocatable "default" sections can have allocatable mergeable symbols with compatible entry sizes assigned to them.
+ at explicit_default_3 = unnamed_addr constant [2 x i8] [i8 1, i8 0], section ".debug_str"
+
----------------
MaskRay wrote:
> This test does not seem correct.
> 
> .debug_str is a pre-allocated non-SHF_ALLOC section. A constant (SHF_ALLOC) cannot reuse the section.
This is the current behaviour. You can also do this sort of thing using a linker script. This patch is only fixing entsize with explicit section assignment. Initially, I was going to handle *any* incompatibilities between the symbol properties and the properties of the section that the symbol is assigned to to fix https://bugs.llvm.org/show_bug.cgi?id=43457. However, I decided that if you can use a linker script to force incompatible symbols together then explicit section assignment can allow you to do it as well.


================
Comment at: llvm/test/CodeGen/X86/explict-section-mergeable.ll:114
+;; Non-allocatable "default" sections cannot have allocatable mergeable symbols with incompatible (non-mergeable) entry sizes assigned to them.
+ at explicit_default_4 = constant [2 x i16] [i16 1, i16 1], section ".debug_str"
+
----------------
MaskRay wrote:
> This non-unnamed_addr test does not add coverage.
This test is here as a guard in case the implementation of the "default" sections changes in the future. Currently, it doesn't add any coverage over the similar test for an "implicit" section - but it might in the future.


================
Comment at: llvm/test/CodeGen/X86/explict-section-mergeable.ll:266
+ at explicit_asm_1 = unnamed_addr constant [2 x i16] [i16 1, i16 0], section ".asm_mergeable1"
+ at explicit_asm_2 = unnamed_addr constant [2 x i16] [i16 1, i16 0], section ".asm_nonmergeable1"
+
----------------
MaskRay wrote:
> Changing to 1 x i32 can enhance the test.
These need to be strings. See other comments.


================
Comment at: llvm/test/CodeGen/X86/explict-section-mergeable.ll:273
+;; Assign compatible and incompatible globals to the same sections.
+ at explicit_asm_5 = unnamed_addr constant [2 x i16] [i16 1, i16 0], section ".asm_mergeable2"
+ at explicit_asm_6 = unnamed_addr constant [2 x i16] [i16 1, i16 0], section ".asm_nonmergeable2"
----------------
MaskRay wrote:
> explicit_asm_5 and explicit_asm_6 can be deleted. As a replacement, you can add another constant following explicit_asm_2 which is compatible with explicit_asm_[12]
Done. Also, these test cases didn't match the comments so I updated them.


================
Comment at: llvm/test/CodeGen/X86/explict-section-mergeable.ll:280
+;; --implicit-check-not to reduce the number of checks in this file.
+; CHECK: .section ".note.GNU-stack","", at progbits
+
----------------
MaskRay wrote:
> .note.GNU-stack is unrelated to the test.
I don't see how to avoid generating it. I need a check for it as I am using  --implicit-check-not=.section.


================
Comment at: llvm/test/CodeGen/X86/explict-section-mergeable.ll:284
+
+; RUN: not --crash llc < %s -mtriple=x86_64-linux-gnu --no-integrated-as 2>&1 \
+; RUN:     | FileCheck %s --check-prefix=NO-I-ASM
----------------
MaskRay wrote:
> Delete `not --crash`. llc --no-integrated-as should not crash.
This seems to be the common pattern to test for report_fatal_error, see e.g. llvm/test/CodeGen/X86/macho-comdat.ll; llc does seem to crash if errors are emitted.


================
Comment at: llvm/test/CodeGen/X86/explict-section-mergeable.ll:291
+;; Emit error if an incompatible symbol is explicitly placed into a mergeable section.
+; NO-I-ASM: LLVM ERROR: Symbol 'explicit_default_1' from module '<stdin>' required a section with entry-size=0 but was placed in section '.rodata.cst16' with entry-size=16: Explicit assignment by pragma or attribute of incompatible symbol to this section?
----------------
nickdesaulniers wrote:
> Running the unit tests with this diff version Diff 253031, this added test case fails.  Please triple check by running:
> 
> ```
> $ llvm-lit -vv llvm/test/CodeGen/X86/explict-section-mergeable.ll
> ```
> 
> It seems like there's something tricky going on with output redirection, as
> ```
> $ llc < llvm/test/CodeGen/X86/explict-section-mergeable.ll -mtriple=x86_64-linux-gnu --no-integrated-as 2>&1| less
> ```
> produces no output related to checks, but
> ```
> $ llc < llvm/test/CodeGen/X86/explict-section-mergeable.ll -mtriple=x86_64-linux-gnu --no-integrated-as 2>&1
> ```
> does.
Thanks for this report. The test passes for me on windows. I will investigate further tomorrow. If we move to emitting a warning or non-fatal error then this problem may be bypasssed.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72194/new/

https://reviews.llvm.org/D72194





More information about the llvm-commits mailing list