[PATCH] D131589: [llvm-objdump] Handle multiple syms at same addr in disassembly.

Fri Aug 19 03:35:07 PDT 2022

jhenderson added a comment.

In D131589#3734616 <https://reviews.llvm.org/D131589#3734616>, @simon_tatham wrote:

> In D131589#3728128 <https://reviews.llvm.org/D131589#3728128>, @jhenderson wrote:
>
>> I think one bit of testing you still need is showing whether the demangled or raw name is used for the priority ordering of symbol names.
>
> I'm not sure what you mean by that. Priority order of symbol //names?// The sorting order in `Symbols` is set up before anything gets demangled, so it will be based on the raw name, but that isn't changed by this patch.

Your new test is about multiple symbols at the same location, and you specifically call out the alphatical sorting in a comment. That then immediately raises the question about whether demangled or mangled names are used. It's not the end of the world, since you rightly point out that this aspect hasn't changed, but I think it would still be useful to check (assuming of course it isn't already tested, anyway).

================
Comment at: llvm/test/tools/llvm-objdump/ELF/ARM/disassemble-all.s:1
+// Regression test for a bug in which --disassemble-all had the side effect
+// of stopping mapping symbols from being checked in code sections, so that
----------------
Perhaps worth adding "mapping-symbols" to the test name, e.g. `disassemble-all-mapping-symbols.s`, since it's specifically the interaction of the two that's interesting.

================
Comment at: llvm/test/tools/llvm-objdump/ELF/data-vs-code-priority.test:18
+## rather than dumped as data, and that in each case, the code symbol
+## is displayed before
+
----------------
"is displayed before" sounds incomplete. Before what? Do you mean "is displayed first"? Also, missing full stop at end of sentence.

================
Comment at: llvm/test/tools/llvm-objdump/ELF/data-vs-code-priority.test:25-32
+# CHECK-NEXT:  <B2function>:
+# CHECK-NEXT: movw r0, #2
+# CHECK-EMPTY:
+# CHECK-NEXT:  <A3notype>:
+# CHECK-NEXT: movw r0, #3
+# CHECK-EMPTY:
+# CHECK-NEXT:  <B4notype>:
----------------
Nit: the whitespace for indentation of these lines is inconsistent with the first block above. Please fix.

================
Comment at: llvm/test/tools/llvm-objdump/multiple-symbols.test:33
+# DEFAULT-NOT: >:
+# DEFAULT:       00000000 <bbbb>:
+# DEFAULT-NEXT:         0: e0800080      add     r0, r0, r0, lsl #1
----------------
simon_tatham wrote:
> jhenderson wrote:
> > Rather than half a dozen different CHECK patterns, which are mostly duplicates, I'd consider using multiple check prefixes in each test case to enable/disable the relevant parts. For example, you'd have one prefix for each of the symbols, and then another prefix for each of the disassembly blocks. You could use an `--implicit-check-not` to the FileCheck commands to ensure other stuff isn't printed incorrectly, instead of the `-NOT` style too, but up to you.
> > 
> > Rough example:
> > ```
> > # AMAP: 00000000 <$a.0>:
> > # AAAA: 00000000 <aaaa>:
> > # BBBB: 00000000 <bbbb>:
> > # CODE1: <first functions code>
> > # CODE1-NEXT: ...
> > ...
> > # TMAP: ....
> > ```
> I had a try at this, but I'm afraid I couldn't see how to make it test the things I want tested.
> 
> The problem is that the `-NEXT` suffix doesn't apply between different FileCheck prefixes. If I write, for example,
> ```
> COMMON: some header
> FOO-NEXT: line involving foo
> BAR-NEXT: line involving bar
> ```
> then I'd like `--check-prefixes=COMMON,BAR` to enforce that the bar line shows up immediately after the header line, and there isn't an intervening line of any kind. But in fact the `BAR-NEXT` check provokes an error message from FileCheck that there should have been a previous `BAR` check for it to be next to. It apparently means "must be on the next line from the previous check with the same prefix", not "... with any currently enabled prefix".
> 
> So I think if I converted this test into your suggested style, I'd lose the ability to have `NEXT` checks at all, so I'd have to have a pair of prefixes for each piece of output, denoting "this line is / is not expected to appear in the file" ...
> ```
> # AMAP:     00000000 <$a.0>:
> # AMAP-NOT: 00000000 <$a.0>:
> # AAAA:     00000000 <aaaa>:
> # AAAA-NOT: 00000000 <aaaa>:
> ```
> and then each RUN line would have to have an absolutely enormous collection of check-prefixes specifying every single line it both did //and didn't// want.
> 
> I can give that a try if you really want me to, but are you //sure// it's clearer? The effect from my point of view is that all the details of what makes one test run different from another are now way off to the right and smushed into a long undistinguished list of keywords, and you have to cross-refer to the checks anyway to make sense of them, instead of laid out in a table of "here is what this test expects to see".
> The problem is that the -NEXT suffix doesn't apply between different FileCheck prefixes.

I'm 95% certain that this is incorrect, as I just tested it out locally with the following test passing fine for me:
```
# RUN: echo foo > %t.txt
# RUN: echo baz >> %t.txt
# RUN: FileCheck %s --input-file=%t.txt --check-prefixes=FOO,BAZ

# FOO: foo
# BAR-NEXT: bar
# BAZ-NEXT: baz
```

Did you perhaps accidentally omit the `COMMON` from one of your FileCheck prefix sets? The only rule for -NEXT/-EMPTY commands is that there has to be one regular check (across all prefix sets) before the first -NEXT/-EMPTY.

================
Comment at: llvm/test/tools/llvm-objdump/multiple-symbols.test:101-103
+## Similarly for the other two functions. This time we must check that the
+## aaaa/bbbb block of code was not disassembled _before_ the output we're
+## expecting.
----------------
simon_tatham wrote:
> jhenderson wrote:
> > Bearing in mind that --disassemble-symbol should already have testing elsewhere, what does this second block of code + function symbols give us that the first block alone doesn't?
> It took me a day to remember what I'd been thinking here myself, so I agree it's unclear! The intention of having both an Arm and a Thumb function was to ensure that each one is disassembled in the right one of those states, because the mapping symbols that indicate the changeover are still reliably recognised, regardless of which subset of symbols is being //displayed//.
Perhaps worth additional comments then to explain this.

================
Comment at: llvm/tools/llvm-objdump/llvm-objdump.cpp:1518
+      // Distinguish ELF data from code symbols, which will be used later on to
+      // decide whether to 'disassemble' this chunk as at data declaration via
+      // dumpELFData(), or whether to treat it as code.
----------------

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131589/new/

https://reviews.llvm.org/D131589