[PATCH] D70720: [llvm-objdump] Display locations of variables alongside disassembly

Thu Jan 30 03:28:40 PST 2020

ostannard marked 6 inline comments as done.
ostannard added inline comments.

================
Comment at: llvm/docs/CommandGuide/llvm-objdump.rst:132-135
+.. option:: --debug-vars-unicode=<true|false>
+
+  If true, use unicode characters when displaying source-level variables. If
+  false, use only ASCII characters. Defaults to true.
----------------
jhenderson wrote:
> This isn't quite the interface I'd use. I'd actually recommend making ASCII the default, since it will work in all situations, rather than requiring Windows users to specify an extra switch. If there's a good reason not to, I'd change this to --debug-vars-ascii, so that it's the normal practice of "use a switch to enable a behaviour" i.e. `--debug-vars-unicode` is all that is needed (and there's no need to do the slightly weird syntax of `--debug-vars-unicode=false`).
> 
> A third option (possibly in conjunction with the third) would be to change the switch name to --debug-vars-style, and allow it to take an enum of ASCII/Unicode (or more specifically UTF-8 probably). However, I don't think it's likely that we'll ever need more than two modes, so this is probably overkill.
> 
> My personal preference would be the first of the option, as I am primarily Windows developer, as are all of the developers I work with/make tools for.
I've tested this on windows 10, and the unicode output displays correctly (in both cmd.exe and powershell). Is there some other version or configuration of windows in which this won't work? If ASCII is only needed for older versions of windows, I'd like to keep the default as unicode, because I find it much more legible than the ASCII version. There's also the option of making ASCII the default for windows and unicode elsewhere, but I'd rather avoid that if possible.

I do like the idea of switching the spelling to `--debug-vars-ascii` though.

================
Comment at: llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf4-sections.s:57
+.Lfunc_begin0:
+	.file	1 "/work" "llvm/src/llvm/test/tools/llvm-objdump/ARM/Inputs/debug.c"
+	.loc	1 1 0
----------------
jhenderson wrote:
> ostannard wrote:
> > jhenderson wrote:
> > > jhenderson wrote:
> > > > ostannard wrote:
> > > > > jhenderson wrote:
> > > > > > Is this string important to the test? If it is, I don't think it'll work on others' machines.
> > > > > > 
> > > > > > You may need to use sed to replace the input string at test time. There are examples in the source-interleave tests in the X86 folder of llvm-objdump.
> > > > > I'm using sed to replace this string in debug-vars-dwarf4.s, where I'm testing the interaction with source and line number interleaving. I didn't think it would be worth testing that in every file, because the interactions are all in the formatting of the output, which is the same in each file.
> > > > Ping? You have addressed this in this test?
> > > Right, so just to clear things up for my own sanity: --debug-vars is not directly related to source interleaving in any way, right? It uses only the debug information in the object file, right?
> > Added a note to the comment at the top of the file.
> I'm not seeing any changes in the comments since the previous version?
Oops, missed a git-add.

================
Comment at: llvm/tools/llvm-objdump/llvm-objdump.cpp:611
+        OS << "<unknown op " << dwarf::OperationEncodingString(Opcode) << " ("
+           << (int)Opcode << ")>";
+        return;
----------------
jhenderson wrote:
> This seems unrelated?
What do you mean? There are DWARF opcodes not handled by this function yet, we have to do something if we hit one, and printing the opcode will make adding the missing cases easier.

================
Comment at: llvm/tools/llvm-objdump/llvm-objdump.cpp:839
+  };
+  const char *getLineChar(LineChar C) const {
+    switch (C) {
----------------
jhenderson wrote:
> Have you considered something like the following approach instead of this function (which you have to call every time you need to add a single character)?
> 
> 1) Keep the enum, but assign the enum values all specific values from 0.
> 2) Create two constant char arrays, one with the Unicode version, and the other with the ASCII version of the characters.
> 3) Set some variable early (e.g. `DbgVariableChars`) to the value of one of the arrays, dependent on the style being used.
> 4) Use the chars by referencing the `DbgVariableChars` array, indexed by the enum, i.e. something like `DbgVariableChars[RangeStart]`.
> 
> I'm not sure if it's simpler in practice or not, but it is at least worth considering.
The unicode characters are each multiple bytes, so the indexes wouldn't match up between the two arrays. We could have to arrays of string literals, with one character in each literal, but I don't think that has any significant advantage over the switch statement.

================
Comment at: llvm/tools/llvm-objdump/llvm-objdump.cpp:848
+    case LineChar::LabelVert:
+      return DbgVariablesUnicode ? "│" : "|";
+    case LineChar::LabelCornerNew:
----------------
jhenderson wrote:
> Is this code tested anywhere? I couldn't find it when searching.
This character only gets used when multiple live ranges start at the same assembly instruction. The example in M2 (more complex than the current tests) uses it twice - the first is a compiler bug, so I don't want to include it in a test, and the second requires printing multiple locations for parts of structs. I expect this to be tested properly when adding the struct location printing.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70720/new/

https://reviews.llvm.org/D70720