[PATCH] D131589: [llvm-objdump] Handle multiple syms at same addr in disassembly.

Fri Aug 19 02:35:07 PDT 2022

simon_tatham marked 10 inline comments as done.
simon_tatham added a comment.

In D131589#3728128 <https://reviews.llvm.org/D131589#3728128>, @jhenderson wrote:

> I think one bit of testing you still need is showing whether the demangled or raw name is used for the priority ordering of symbol names.

I'm not sure what you mean by that. Priority order of symbol //names?// The sorting order in `Symbols` is set up before anything gets demangled, so it will be based on the raw name, but that isn't changed by this patch.

================
Comment at: llvm/test/tools/llvm-objdump/multiple-symbols.test:21
+
+## The test input file contains an Arm and a Thumb function, each with two
+## function-type symbols defined at its entry point. Also, because it's Arm,
----------------
jhenderson wrote:
> simon_tatham wrote:
> > jhenderson wrote:
> > > Nit: here and elsewhere, I think the canonical spelling is ARM.
> > I do have to keep remembering that LLVM's canonical spelling isn't the same as Arm's canonical spelling. My fingers have a very strong habit of typing it the way //we// spell it, for obvious reasons!
> I mean, I'm going off what wikipedia (and several other websites) tells me is the spelling :-) Strange that Arm's official spelling according to the company is different! I'd be happy to go back to what you had before then!
There was a change of preference at some point in the past, and it's entirely possible that not everyone has caught up. But in recent years Arm's preferred spelling of its own name is "Arm".

(Obviously identifiers in source code have to match the existing spelling and all be consistent, but comments can be up to date!)

================
Comment at: llvm/test/tools/llvm-objdump/multiple-symbols.test:33
+# DEFAULT-NOT: >:
+# DEFAULT:       00000000 <bbbb>:
+# DEFAULT-NEXT:         0: e0800080      add     r0, r0, r0, lsl #1
----------------
jhenderson wrote:
> Rather than half a dozen different CHECK patterns, which are mostly duplicates, I'd consider using multiple check prefixes in each test case to enable/disable the relevant parts. For example, you'd have one prefix for each of the symbols, and then another prefix for each of the disassembly blocks. You could use an `--implicit-check-not` to the FileCheck commands to ensure other stuff isn't printed incorrectly, instead of the `-NOT` style too, but up to you.
> 
> Rough example:
> ```
> # AMAP: 00000000 <$a.0>:
> # AAAA: 00000000 <aaaa>:
> # BBBB: 00000000 <bbbb>:
> # CODE1: <first functions code>
> # CODE1-NEXT: ...
> ...
> # TMAP: ....
> ```
I had a try at this, but I'm afraid I couldn't see how to make it test the things I want tested.

The problem is that the `-NEXT` suffix doesn't apply between different FileCheck prefixes. If I write, for example,
```
COMMON: some header
FOO-NEXT: line involving foo
BAR-NEXT: line involving bar
```
then I'd like `--check-prefixes=COMMON,BAR` to enforce that the bar line shows up immediately after the header line, and there isn't an intervening line of any kind. But in fact the `BAR-NEXT` check provokes an error message from FileCheck that there should have been a previous `BAR` check for it to be next to. It apparently means "must be on the next line from the previous check with the same prefix", not "... with any currently enabled prefix".

So I think if I converted this test into your suggested style, I'd lose the ability to have `NEXT` checks at all, so I'd have to have a pair of prefixes for each piece of output, denoting "this line is / is not expected to appear in the file" ...
```
# AMAP:     00000000 <$a.0>:
# AMAP-NOT: 00000000 <$a.0>:
# AAAA:     00000000 <aaaa>:
# AAAA-NOT: 00000000 <aaaa>:
```
and then each RUN line would have to have an absolutely enormous collection of check-prefixes specifying every single line it both did //and didn't// want.

I can give that a try if you really want me to, but are you //sure// it's clearer? The effect from my point of view is that all the details of what makes one test run different from another are now way off to the right and smushed into a long undistinguished list of keywords, and you have to cross-refer to the checks anyway to make sense of them, instead of laid out in a table of "here is what this test expects to see".

================
Comment at: llvm/test/tools/llvm-objdump/multiple-symbols.test:101-103
+## Similarly for the other two functions. This time we must check that the
+## aaaa/bbbb block of code was not disassembled _before_ the output we're
+## expecting.
----------------
jhenderson wrote:
> Bearing in mind that --disassemble-symbol should already have testing elsewhere, what does this second block of code + function symbols give us that the first block alone doesn't?
It took me a day to remember what I'd been thinking here myself, so I agree it's unclear! The intention of having both an Arm and a Thumb function was to ensure that each one is disassembled in the right one of those states, because the mapping symbols that indicate the changeover are still reliably recognised, regardless of which subset of symbols is being //displayed//.

================
Comment at: llvm/tools/llvm-objdump/llvm-objdump.cpp:1583

-      bool CheckARMELFData = hasMappingSymbols(Obj) &&
-                             Symbols[SI].Type != ELF::STT_OBJECT &&
----------------
simon_tatham wrote:
> jhenderson wrote:
> > The change of this logic should correspond to some sort of test case, I think, but I don't think I see anything?
> It turned out that I had trouble thinking of something that would have //changed// as a result of removing this section!
> 
> The intention of the old code here is to avoid checking mapping symbols if we're starting disassembly at an `STT_OBJECT` symbol. But `STT_OBJECT` symbols are handled by the previous if statement by going to `dumpELFData` and then terminating this loop iteration, so it's difficult for one to get as far as here in the first place.
> 
> If there is any case that could have got here at all without being eaten by the previous test, it must be a confusing edge case of some kind and I haven't put my finger on it yet.
Aha! There //is// an edge case affected by this change. If you set `--disassemble-all` to force disassembly of data sections, then the previous code would have had the side effect of ignoring mapping symbols in //code// sections, so you'd get Thumb code mistakenly disassembled as Arm.

The new criterion of "use mapping symbols if they're there" stops that failure from happening. I'll add a regression test for it.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131589/new/

https://reviews.llvm.org/D131589