[llvm] Allow 128-bit discriminants in DWARF variants (PR #125578)

Tue Feb 4 11:57:26 PST 2025

dwblaikie wrote:

> > I think that's probably still the right path - what unrelated test changes did you discover down this path?
> 
> If `addConstantValue` calls the new `addInt`, then different forms can be emitted in some tests. This is because `addUInt` and `addSInt` use `DIEInteger::BestForm`, while the `uint64_t` overload of `addConstantValue` has this:
> 
> ```
>   // FIXME: This is a bit conservative/simple - it emits negative values always
>   // sign extended to 64 bits rather than minimizing the number of bytes.
>   addUInt(Die, dwarf::DW_AT_const_value,
>           Unsigned ? dwarf::DW_FORM_udata : dwarf::DW_FORM_sdata, Val);
> ```

Ah, thanks for walking me through it... Yeah, thinking about alternatives (ways to pass in the form, etc) they all seem awkward. Perhaps only refactoring out the block code would be workable? 
```
  void DwarfUnit::addConstantValue(DIE &Die, const APInt &Val, bool Unsigned) {
    unsigned CIBitWidth = Val.getBitWidth();
    if (CIBitWidth <= 64) {
      addConstantValue(Die, Unsigned,
                       Unsigned ? Val.getZExtValue() : Val.getSExtValue());
      return;
    }

    addIntAsBlock(Val);
  }
  void DwarfUnit::addIntAsBlock(DIE &Die, const APInt &Val) {
    DIEBlock *Block = new (DIEValueAllocator) DIEBlock;

    // Get the raw data form of the large APInt.
    const uint64_t *Ptr64 = Val.getRawData();

    int NumBytes = Val.getBitWidth() / 8; // 8 bits per byte.
    bool LittleEndian = Asm->getDataLayout().isLittleEndian();

    // Output the constant to DWARF one byte at a time.
    for (int i = 0; i < NumBytes; i++) {
      uint8_t c;
      if (LittleEndian)
        c = Ptr64[i / 8] >> (8 * (i & 7));
      else
        c = Ptr64[(NumBytes - 1 - i) / 8] >> (8 * ((NumBytes - 1 - i) & 7));
      addUInt(*Block, dwarf::DW_FORM_data1, c);
    }

    addBlock(Die, dwarf::DW_AT_const_value, Block);
  }
```
& reuse that from the new `addInt`?

> I tend to think this FIXME can be removed, since DWARF recommends what LLVM is doing here. See 7.5.5 Classes and Forms subheading "constant":
> 
> > If one of the DW_FORM_dataforms is used to represent a signed or unsigned
> > integer, it can be hard for a consumer to discover the context necessary to
> > determine which interpretation is intended. Producers are therefore strongly
> > encouraged to use DW_FORM_sdata or DW_FORM_udata for signed and
> > unsigned integers respectively, rather than DW_FORM_data.

Mixed feelings about that - it'd impact a lot of pretty simple form uses, like DW_AT_decl_file/line/column, and probably other places where it's obviously unsigned and may benefit from fixed-size DIEs (though I'm incrceasingly suspicious of the fixed-size DIE value, I think with several LEB forms added in DWARFv5 there aren't that many fixed-size DIEs anyway)

> 
> > Potentially doing this in two steps, even - one, an API change to `DwarfUnit::addConstantValue` take the `DW_AT_` to use as a parameter. That change shouldn't cause any change to LLVM's behavior.
> > Then, after that, we could reuse the function from this new location.
> 
> My first take on the patch did this, but in the end I chose to add a new method because `addConstantValue` by its name seemed like it is intended to be specific to `DW_AT_const_value`.

Yeah, I wouldn't feel too badly about generalizing that, possibly renaming - but given the above issues about forms, we'll leave that for another time.

> There's also `addConstantFPValue`, where the aforementioned DWARF section seems to imply that `DW_FORM_udata` should not be used -- rather `DW_FORM_data8` -- see the second paragraph. I feel this probably doesn't matter to readers in practice.

Yeah, seems harder to argue that signedness is valuable when it's just bits to be converted into an fp value anyway.

> By "two steps" do you mean opening two separate pull requests? I'm new to LLVM and my impression is that patch series aren't really done, but I wanted to confirm.
> 
> Anyway I'm happy to proceed whatever way you like, just let me know, thanks.

Think one patch'll do. There are some ways to do stacked PRs in LLVM, but I haven't dabbled with them - I think I had in mind either separate (stacked) PRs, or separate commits in one PR, just to make it easier to isolate and review the changes, even if they would get committed as a single merged commit. We do prefer isolated changes, but small amounts of refactoring in a functional commit are fine.

Oh, out of curiosity: What's the motivation for this support?

https://github.com/llvm/llvm-project/pull/125578