[PATCH] D84234: [DWARFYAML] Implement the .debug_loclists section.

Wed Jul 22 02:43:48 PDT 2020

jhenderson added inline comments.

================
Comment at: llvm/include/llvm/ObjectYAML/DWARFYAML.h:201-202
+  std::vector<yaml::Hex64> Values;
+  Optional<yaml::Hex64> DescriptorsLength;
+  std::vector<DWARFOperation> Descriptors;
+};
----------------
I think "Descriptions" is the canonical term in the spec, so we should probably stick with that, i.e. use `DescriptionsLength` and `Descriptions`.

================
Comment at: llvm/include/llvm/ObjectYAML/DWARFYAML.h:190
+  dwarf::LocationAtom Operator;
+  std::vector<yaml::Hex64> Values;
+};
----------------
labath wrote:
> Higuoxing wrote:
> > Higuoxing wrote:
> > > There are some operators that take a size and a block of that size. I'm not sure if it's ok to use a `std::vector<>` here.
> > I think we can add a new field called `BlockSize` which is used to overwrite the size of blocks.
> > 
> > ```
> > struct DWARFOperation {
> >   dwarf::LocationAtom Operator;
> >   Optional<yaml::Hex64> BlockSize;
> >   std::vector<yaml::Hex64> Values;
> > };
> > ```
> > 
> > e.g., `DW_OP_implicit_value` takes two operands, the first operand is an ULEB128 size, the second operand is a block of that size. We can use `std::vector<yaml::Hex64>` to hold a series of bytes, and the size of this block can be overwritten by `BlockSize`.
> Supporting DW_OP_entry_value is going to be amusing, as that takes an entire nested dwarf expression as an operand. But I think it would be fine even if the nested expression is simply printed in hex (in fact, I would be fine if initially _all_ dwarf expressions are simply printed in hex).
I guess longer-term, we could add another field to a DWARFOperation, that is itself an Optional<DWARFOperation>, to allow for a kind of linked-list approach to the expression. In the YAML, it might look like:

```
- Entries:
    - Operator:          DW_LLE_startx_endx
      Values:            [ 0x1234, 0x4321 ]
      DescriptorsLength: 0x4321
      Descriptors:
        - Operator: DW_OP_entry_value
          Expression:
            - Operator: DW_OP_entry_value
              Expression:
                - Operator: DW_OP_entry_value
                ...
```

An operation would only be allowed to have one of Values or Expression in that case. But I think you're right, there's no need to do that now. The behaviour in this patch would be enough for simple cases.

@Higuoxing, I think your idea with an optional `BlockSize` makes sense to me.

================
Comment at: llvm/lib/ObjectYAML/DWARFEmitter.cpp:446-448
+writeDWARFExpression(raw_ostream &OS,
+                     const DWARFYAML::DWARFOperation &Operation,
+                     uint8_t AddrSize, bool IsLittleEndian) {
----------------
One of the things yaml2obj is useful for is for creating broken ELFs. I see you're adding further checks here to make sure the `Values` array is the right size for the type. But what if you want to have an incorrect number of values for a given operation?

There are similar issues elsewhere, I'm sure, so I'm not saying this specific patch needs changing at this point, but I do think you need to consider how to solve this case. For example, it might mean being more specific with the type of value by using distinct fields for different types. For example, instead of `Values: [0x1234, 0x4321]` you might want something like:
```
Values:
  - Type: SLEB128
    Value: 0x1234
  - Type: Address
    Value: 0x4321
```
or even:
```
Values:
  - SLEB128: 0x1234
  - Address: 0x4321
```
What do you think?

================
Comment at: llvm/lib/ObjectYAML/DWARFEmitter.cpp:567
+
+  auto CheckOperands = [&](uint64_t ExpectedOperands) -> Error {
+    if (Entry.Values.size() != ExpectedOperands) {
----------------
These lambdas are very similar to those elsewhere. Maybe you should pull them out into a function.

================
Comment at: llvm/lib/ObjectYAML/DWARFEmitter.cpp:621
+    break;
+  case dwarf::DW_LLE_base_addressx:
+    if (Error Err = CheckOperands(1))
----------------
Similar comments apply here to my earlier thoughts regarding checking operands.

================
Comment at: llvm/test/tools/yaml2obj/ELF/DWARF/debug-loclists.yaml:1
+## Test that yaml2obj emits a .debug_loclists section when requested.
+
----------------
I'll look at the testing more tomorrow.

================
Comment at: llvm/test/tools/yaml2obj/ELF/DWARF/debug-loclists.yaml:22-92
+# DWARF32-LE-NEXT:   0000: 3D000000 05000800 03000000 0C000000  |=...............|
+##                         ^-------                             unit_length (4-byte)
+##                                  ^---                        version (2-byte)
+##                                      ^-                      address_size (1-byte)
+##                                        ^-                    segment_selector_size (1-byte)
+##                                           ^-------           offset_entry_count (4-byte)
+##                                                    ^-------  offsets[0] (4-byte)
----------------
Higuoxing wrote:
> labath wrote:
> > Higuoxing wrote:
> > > labath wrote:
> > > > Would it be better to verify this structurally by checking llvm-dwarfdump output ?
> > > Hi @labath, I've tested the generated sections using llvm-dwarfdump on my local machine. We usually check the hex dump of sections to "bootstrap" yaml2obj. You can take a look at this thread :-) https://reviews.llvm.org/D82367#2134629
> > I see. That makes sense. Thanks for the pointer.
> I'm wondering if it's helpful to put the output of `llvm-dwarfdump` in comments. e.g.,
> 
> ```
> ## $ llvm-dwarfdump --debug-loclists -v %t.o
> ## .debug_loclists contents:
> ## 0x00000000: locations list header: length = 0x00000023, version = 0x0005, addr_size = 0x08, seg_size = 0x00, offset_entry_count = 0x00000002
> ## offsets: [
> ## 0x00000010 => 0x00000024
> ## 0x0000001a => 0x0000002e
> ## ]
> ## 0x00000024: 
> ##             DW_LLE_startx_endx     (0x0000000000001234, 0x0000000000004321): DW_OP_consts +4660
> ##             DW_LLE_end_of_list     ()
> ```
I don't have a strong opinion on this either way. One of the problems is that this will rot easily if you make changes/expand the test, or if llvm-dwarfdump changes output slightly. I don't know if it gives any additional benefit beyond the comments.

One possible thought I had is that you could, in addition to the hex testing, add llvm-dwarfdump checks too. The hex check ensures the content is correct, whilst the llvm-dwarfdump check shows a) llvm-dwarfdump is correct, and b) provides a more readable way of showing yaml2obj is correct. If there's a failure in the llvm-dwarfdump check, you'd need to carefully inspect the hex to see if the hex check was wrong (i.e. yaml2obj has a bug) or there's a bug in llvm-dwarfdump. This would probably need some sort of commenting to explain the process. Maybe a readme.txt in the test directory would be enough, I don't know.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D84234/new/

https://reviews.llvm.org/D84234