[PATCH] D78702: [RFC][RISCV][MC/Objdump] Extend llvm-objdump output to support more instruction patterns

Fangrui Song via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Apr 23 10:16:11 PDT 2020


MaskRay added a comment.

Thanks for the patch. Several RISC targets need more than one instruction to materialize an address. Making MCInstrAnalysis is moving toward the correct direction.

> To add such functionality, I have extended MCInstrAnalysis with a new evaluateInst function, which provides the same functionality as evaluateBranch, but has the ability to store information across multiple instructions allowing these patterns to be picked up. The default implementation does the same check llvm-objdump does before calling evaluateInst, so is a NFC change for other targets, but for RISC-V does a more detailed analysis. This allows evaluateInst to be safe to call on all instructions and thus is called for each instruction.

Looks good. This can benefit PC-relative instructions on other targets as well. For example, we can symbolize the target addresses `movq    4101(%rip), %rax` for x86. I do have a plan to add a similar interface but you beat me to it.

> The MCInstrAnalysis object is no longer const, should this analysis belong in this class (which given the evaluateBranch call, I think makes sense)

Making the MCInstrAnalysis instance mutable is required to make it stateful.

> There is a resetAnalysis hook which provides an additional signal that control flow has changed, I think as a "new symbol/file" indicator this should be fine.

Another idea is to just construct a fresh MCInstrAnalysis instance for a new object file. This is required by ARM which has two MCInstrAnalysis subclasses for ARM-state and Thumb-state. Arguably its MCInstrAnalysis should be stateful too to take into account of data mapping symbols. In the future we may need MCSubtargetInfo. Reconstructing a new MCInstrAnalysis instance does not seem like a large cost we want to avoid.

> I've added a address printout after the instruction just for RISC-V; this matches GNU objdump's behaviour, but does not necessarily have to land with the rest of the patch.

Maybe something similar to my previous patches on improving the disassembly output? (D76580 <https://reviews.llvm.org/D76580>/D76591 <https://reviews.llvm.org/D76591>/D77853 <https://reviews.llvm.org/D77853>)
Ideally that patch should land before this one.

> I do an explicit check for __global_pointer$ and pass this through to MCInstrAnalysis since it has no concept of anything more than MCInsts, there may be scope for making this more generic.

Please change the example to PC relative addresses first. Honestly I think GP is an ugly part of the ABI. This was confirmed here https://groups.google.com/a/groups.riscv.org/forum/#!searchin/sw-dev/__global_pointer$24%7Csort:date/sw-dev/ZjYUJswknQ4/bhFnlWc8BQAJ expand "quoted text"

If we can place

  if (MIA &&
      (Obj->getArch() == Triple::riscv32 ||
       Obj->getArch() == Triple::riscv64) &&
      Name == "__global_pointer$")

in a more suitable target specific place, I will not necessarily block to that change. Note, we can just do the non-controversial PC relative addresses first. It is sufficient to demonstrate the benefits.

> Since I haven't written tests yet, I can give an example of before/after with this patch:

Consider using PC-relative instructions and exhibiting `diff -u` output.

> jalr    ra, -94(ra) #10074 <main>

I'd prefer we just print the target address. GNU objdump always prints the target address. I migrated some targets to print target addresses (see the diffs I linked above). If you feel we need a command line option to print an immediate instead, we can add it, but IMHO that customized output should not be the default. (I asked some people and many feel that the immediate is not useful)

Recently on the binutils mailing list, Alan Modra is proposing some options to control objdump -d output https://sourceware.org/pipermail/binutils/2020-April/110669.html If you have ideas, please speak up.



================
Comment at: llvm/include/llvm/MC/MCInstrAnalysis.h:165
+  /// case Arch is used to indicate the Architecture of the incoming object.
+  virtual void resetAnalysis(bool NewObject = false,
+                             Triple::ArchType Arch = Triple::UnknownArch) {}
----------------
We can just construct a new MCInstrAnalysis instance. See my main comment.


================
Comment at: llvm/include/llvm/MC/MCInstrAnalysis.h:171
+  /// NewObject=true
+  virtual void setGPForAnalysis(uint64_t Addr) {}
+
----------------
Sigh, GP is an ugly part of the ABI. See my main comment.


================
Comment at: llvm/tools/llvm-objdump/llvm-objdump.cpp:1580
+            // analysis is complete
+            if (Obj->getArch() == Triple::riscv32 ||
+                Obj->getArch() == Triple::riscv64)
----------------
I know that other `getArch` calls exist in this file, but for new code we should avoid them.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78702/new/

https://reviews.llvm.org/D78702





More information about the llvm-commits mailing list