[PATCH] D106585: Fix clang debug info irgen of i128 enums
    David Blaikie via Phabricator via llvm-commits 
    llvm-commits at lists.llvm.org
       
    Sat Jul 24 11:27:09 PDT 2021
    
    
  
dblaikie accepted this revision.
dblaikie added a comment.
Preserving existing behavior sounds OK - maybe some comment about that it might be good to remove so the next person who looks at it knows there's something not-quite-fully-reasoned here (& the comment about GCC's representation choice still seems off - GCC does use the signedness of the enumerators, not only the enumeration).
================
Comment at: clang/lib/CodeGen/CGDebugInfo.cpp:3117
+    llvm::APSInt Value = Enum->getInitVal();
+    Value.setIsSigned(IsSigned);
+    Enumerators.push_back(DBuilder.createEnumerator(Enum->getName(), Value));
----------------
rnk wrote:
> MaskRay wrote:
> > dblaikie wrote:
> > > rnk wrote:
> > > > rnk wrote:
> > > > > dblaikie wrote:
> > > > > > rnk wrote:
> > > > > > > dblaikie wrote:
> > > > > > > > Is the value already signed appropriately?
> > > > > > > > 
> > > > > > > > Removing this line of code (and the `bool IsSigned` variable, so it doesn't break `-Werror=unused-variable`) doesn't cause any tests to fail, that I can see.
> > > > > > > I'm afraid it might not be NFC. I took the cautious approach of trying to leave things exactly as they were. Enums in C can have surprisingly different behavior, IIRC.
> > > > > > I'd rather not leave in haunted-graveyard-y code like that.
> > > > > > 
> > > > > > Could we assert that Value.getIsSigned == IsSigned? Then if there's a case where it isn't we'll have a test case/justification for the code?
> > > > > I convinced myself that the signed-ness of the APSInt in the EnumConstantDecl always matches by looking at this code here:
> > > > > https://github.com/llvm/llvm-project/blob/aee8457b8d4123d087c45aef95d14f24934fed53/clang/lib/Sema/SemaDecl.cpp#L18379
> > > > > 
> > > > > So far as I can tell, we always run that code, no matter whether LangOpts.CPlusPlus is set or not. That's enough for me.
> > > > Oh, I was wrong, your suggested change *does* break clang/test/CodeGen/enum2.c. My caution was warranted. :)
> > > Hrm, still not quite clear on what's going on. Seems like GCC actually does things differently - it does produce DWARF using the signedness of the literal to dictate the signedness of the enumerator, which seems inconsistent with the comment ("This matches the DWARF produced by GCC for C enums with positive enumerators" - well it's consistent with that statement (for positive enumerators) but inconsistent with what happens if you mix sign of enumerators - then GCC uses different encodings, but Clang doesn't (regardless of the representational difference - with isSigned true or false))
> > > 
> > > Looks like LLVM, for DWARF at least, uses the signedness of the enumeration's type ( https://github.com/llvm-mirror/llvm/blob/2c4ca6832fa6b306ee6a7010bfb80a3f2596f824/lib/CodeGen/AsmPrinter/DwarfUnit.cpp#L1406-L1427 ). (the BTF backend seems to respect the individual signedness)
> > > 
> > > So... mixed feelings? For DWARF it looks like the signed-ness representation doesn't matter/is ignored and only the signedness of the enumeration is used. GCC respects the signedness of different expressions - but I don't know that that's a good choice either, seems like it should represent the value of the enumerator proper, not of the expression used to initialize the enumerator.
> > > 
> > > And also - what's the deal with the APInt encoding? It looks like something in the IR printing correctly encodes the signedness of the APInt value:
> > > ```
> > > $ clang-tot enum.c -g -c -emit-llvm -S -o - | grep Enumerator
> > > !6 = !DIEnumerator(name: "A", value: 0)
> > > !7 = !DIEnumerator(name: "B", value: -1)
> > > ```
> > > (this is the output without the per-enumerator signedness initialization)
> > > So how's that working & why do we carry the extra signedness in a boolean as well? 
> > > 
> > > Sorry, feel free to ignore all this if it's not worth the rabbit-hole. I'm generally OK with this preserving the existing behavior - but it does raise a bunch of questions for me.
> > > 
> > > 
> > > Seems like GCC actually does things differently
> > 
> > Can you elaborate a bit with an example?
> > 
> > I played with this a bit and GCC does appear to behave similar to clang with this patch.
> > 
> > Because of disallowed C++11 narrowing, I cannot place an signed enumerator in a enumeration type with an unsigned underlying type:
> > 
> > GCC
> > error: enumerator value ‘-3’ is outside the range of underlying type ‘long unsigned int’
> > (clang can suppress this with -Wno-c++11-narrowing/-Wno-narrowing)
> > 
> > And also - what's the deal with the APInt encoding? It looks like something in the IR printing correctly encodes the signedness of the APInt value:
> 
> I believe LLVM IR integers are always interpreted as signed, but really it's just a bit pattern. It's an APInt. The signedness is stored as a separate boolean in `DIEnumerator`.
> 
> > I played with this a bit and GCC does appear to behave similar to clang with this patch.
> 
> I think we were discussing behavior in C. Clang in C++ always implicitly casts the enumerator constant to the enum type, but in C it seems it doesn't. In C, the actual enumerators have a type of int, and the enum has it's own type. It doesn't make sense to me, but that's as far as I got before I said forget it, just do the NFC thing.
Re: @Maskray - Here's the GCC per-enumerator difference I was observing/referring to:
```
$ gcc-tot enum.c -g && llvm-dwarfdump-tot a.out -debug-info -v | grep -A2 "_enumerator\|enumeration"
0x0000002d:   DW_TAG_enumeration_type [2] *
                DW_AT_name [DW_FORM_string]     ("X")
                DW_AT_encoding [DW_FORM_data1]  (DW_ATE_signed)
--
0x0000003d:     DW_TAG_enumerator [3]  
                  DW_AT_name [DW_FORM_string]   ("A")
                  DW_AT_const_value [DW_FORM_data1]     (0x00)
--
0x00000041:     DW_TAG_enumerator [4]  
                  DW_AT_name [DW_FORM_string]   ("B")
                  DW_AT_const_value [DW_FORM_sdata]     (-1)
blaikie at blaikie-linux2:~/dev/scratch$ g++-tot enum.c -g && llvm-dwarfdump-tot a.out -debug-info -v | grep -A2 "_enumerator\|enumeration"
0x0000002d:   DW_TAG_enumeration_type [2] *
                DW_AT_name [DW_FORM_string]     ("X")
                DW_AT_encoding [DW_FORM_data1]  (DW_ATE_signed)
--
0x0000003d:     DW_TAG_enumerator [3]  
                  DW_AT_name [DW_FORM_string]   ("A")
                  DW_AT_const_value [DW_FORM_data1]     (0x00)
--
0x00000041:     DW_TAG_enumerator [4]  
                  DW_AT_name [DW_FORM_string]   ("B")
                  DW_AT_const_value [DW_FORM_sdata]     (-1)
blaikie at blaikie-linux2:~/dev/scratch$ cat enum.c
enum X { A, B = -1};
int main() { enum X a; }
```
Re: @rnk yeah - I'm still confused, but preserving existing behavior seems sufficient. (& 
================
Comment at: llvm/lib/IR/DIBuilder.cpp:255
+  assert(!Name.empty() && "Unable to create enumerator without name");
+  return DIEnumerator::get(VMContext, APInt(Value), Value.isUnsigned(), Name);
+}
----------------
rnk wrote:
> mizvekov wrote:
> > Do I get it right, and this is not the first place that I noticed this, but the terminology here is a bit unconventional with regards to "Signed" vs "Negative"?
> > 
> > It looks like around the debug info code, an APSInt will be a signed positive value for representing a negative number, and an unsigned one to represent a positive value.
> To my knowledge, we try not to store data in a sign-and-magnitude representation, it's always twos-complement. Meaning, APInt always stores a sequence of bits, typically interpreted as an unsigned positive integer. With outside information about the signedness of that data, you can answer the question of whether it is negative or not. APSInt adds the extra "signed" field.
Yeah. -the bit that I don't understand is that the textual IR seems to represent the value with signedness regardless of the "IsUnsigned" field of the DIEnumerator. And the DWARF emission code doesn't use the signedness of the enumerator at all - looks like it's a remnant of earlier implementations, perhaps.
Repository:
  rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106585/new/
https://reviews.llvm.org/D106585
    
    
More information about the llvm-commits
mailing list