[Lldb-commits] [PATCH] D90840: [lldb/DWARF] Fix sizes of DW_OP_const[1248][us] and DW_OP_litN results

Mon Nov 9 05:53:13 PST 2020

labath added inline comments.

================
Comment at: lldb/source/Expression/DWARFExpression.cpp:945
+  auto to_generic = [addr_size = opcodes.GetAddressByteSize()](auto v) {
+    bool is_signed = std::is_signed<decltype(v)>::value;
+    return Scalar(
----------------
labath wrote:
> dblaikie wrote:
> > labath wrote:
> > > This is probably not conforming either -- dwarf says the generic type is of "unspecified signedness", but I think it means a single global value -- not choosing signedness on a per-operand basis.
> > > 
> > > I've kept that for now, since that was the original behavior, though I didn't notice any issues with this bit removed.
> > Fair, would be good to mark it as at least suspect/uncertain with a comment or such. (or if there's some inverse that would assert if this ever did become relevant, that assertion would be great - but I guess there probably isn't anything we can assert to check that this doesn't matter in practice)
> > 
> > Re: non-conforming: I think the presence of constu and consts encodings that build "generic type"d values probably means the generic type itself may have no particular notion of signedness, but the values that populate it do and I guess intend the value to be interpreted contextually or to carry the sign information? I think it's probably good to carry the sign information from the operands, at least. Maybe with some truncation/sign extension. It's probably pretty close to right... 
> The DWARF standard (particularly version 4, which does not speak about signedness at all), could definitely clearer, but I think that if the committee wanted to convey the notion that a value carries the sign information with it, they would have phrased it differently. To me, the current weasel wording seems to be intended to handle the fact that different architectures have different notions about the signedness of pointers. It this world the exact DW_OP operation used does not affect the signedness of the result, but only the fact whether sign- or zero-extension takes place while converting to the "generic" type.
> 
> This is interpretation seems to match gcc behavior:
> - gcc will happily use DW_OP_const1u to describe a (signed) int constant, even if that constant would fit into a DW_OP_const1s
> - it will also use `DW_OP_const1s -1` as a shorter form of `DW_OP_const4u 0xffffffff` when describing `(unsigned)-1`.
> 
> In fact, these findings (and the fact that clang does not use the fixed-size operands at all) have convinced me to just fix the signedness of the generic type.
After more experimentation I realized this is a rabbit hole. The signedness of the "generic type" is defined on a per-operand basis, but per-operation:
```
If the type of the operands is the generic type, except as otherwise specified, the
arithmetic operations perform addressing arithmetic, that is, unsigned arithmetic
that is performed modulo one plus the largest representable address.
```
The only operation which explicitly uses signed arithmetic is division, but that's like one of the few operations where this distinction matters. Interestingly the modulo operation does _not_ use signed arithmetic.

Even more interestingly, GCC seems to get all of these nuances right -- it writes `a smod b` by `a - (a sdiv b)*b`, and uses multiplicative inverses or DW_OP_(GNU_)convert to produce unsigned division. I don't know what llvm does, as I have not been able to coax it to produce these kinds of expressions, but it seems it has a lot of work to do, both on producer and consumer sides.

For now, I'm just going back to the use-signedness-of-operand implementation, as that is what we've done before.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90840/new/

https://reviews.llvm.org/D90840