[Lldb-commits] [PATCH] D66249: [JIT][Breakpoint] Add "BreakpointInjectedSite" and FCB Trampoline

Fri Aug 23 00:14:22 PDT 2019

labath added a comment.

A bunch more comments from me. So far, I've only tried to highlight the most obvious problems or style issues.
 It's not clear to me whether this is still prototype code or not... If it isn't, I'll have a bunch more.. :)

================
Comment at: lldb/include/lldb/Breakpoint/BreakpointSite.h:54-56
+   static bool classof(const BreakpointSite *bp_site) {
+     return bp_site->getKind() == eKindBreakpointSite;
+   }
----------------
mib wrote:
> labath wrote:
> > This is weird. So, in OO terminology, `BreakpointInjectedSite` "is-a" `BreakpointSite`, but if you ask `llvm::isa<BreakpointSite>(injected_site)`, it will return false?
> No, it returns **true**.
Then does `llvm::isa<BreakpointInjectedSite>(injected_site)` return false? They can't both return true, given the current implementations (but they should, if the dynamic type of `injected_site` really is `BreakpointInjectedSite`...

================
Comment at: lldb/include/lldb/Expression/ExpressionVariable.h:210
+public:
+  typedef AdaptedIterable<collection, lldb::ExpressionVariableSP,
+                          vector_adapter>
----------------
mib wrote:
> labath wrote:
> > What is the purpose of this `AdaptedIterable`, and why is it better than just returning  say `ArrayRef<lldb::ExpressionVariableSP` ?
> Given a class with a container, `AdaptedIterable` will generate the proper iterator accessor for this class.
> 
> In this case, it's pretty convenient to do:
> ```
> for (auto var : expr_vars)
> { }
> ```
> 
> instead of:
> ```
> for (auto var : expr_vars.GetContainer())
> { }
> ```
> 
> It doesn't require to have a getter on the container to iterate over it.
> 
> Most of LLDB container classes are built upon `std` containers (vector, map ...).
> I could change it to `ArrayRef<lldb::ExpressionVariableSP>`, but I don't see why using it would be better, + it might break things (e.g. ArrayRef doesn't implement some of the `std` containers method like `erase()` or `clear()`).
You are adding this function in this very patch. It can't "break" anything if nobody is using it. If you want to expose the full feature set of the underlying container, then why wrap it in this thing, why not return a reference to the container itself?

Though I am not sure if you actually want to do that -- having somebody from the outside tinker with the contents of an internal container seems like it would break encapsulation in most cases.

If you look at the llvm classes, you'll find that they usually just do `iterator_range<???::iterator> variables() { return make_iterator_range(m_variables.begin(), m_variables.end()); }` in situations like these...

================
Comment at: lldb/include/lldb/Target/ABI.h:163
+
+  virtual llvm::ArrayRef<uint8_t> GetJumpOpcode() { return 0; }
+
----------------
I don't think this does what you think it does... it returns an ArrayRef containing a single element -- zero. (And a dangling ArrayRef even...)

================
Comment at: lldb/include/lldb/Target/ABI.h:165
+
+  virtual size_t GetJumpSize() { return 0; }
+
----------------
The point of my earlier suggestion to use ArrayRef is that you don't need to have a separate `XXXSize` function. `ArrayRef<uint8_t>` will encode both the size and the contents.

That said, I'm not convinced that this is a good api anyway. I think it would be extremely hard to make anything on top of these functions that would *not* be target specific (e.g. how would you handle the fact that on arm the jump target is expressed as a relative offset to the address of the jump instruction **+ 8**?).

I think it would be better to just have the ABI plugin just return the fully constructed trampoline jump sequence (say by giving it the final address of the sequence, and the address it should jump to) instead of somebody trying to create a target-independent assembler on top of it.

================
Comment at: lldb/source/Breakpoint/BreakpointInjectedSite.cpp:341
+          "\n"
+          "   typedef unsigned int            uint32_t;\n"
+          "   typedef unsigned long long      uint64_t ;\n"
----------------
aprantl wrote:
> This seems awfully specific. Shouldn't this be target-dependent, and is there no pre-existing table in LLDB that we could derive this from?
This entire function is target specific. I would struggle to find a single line in this expression that could be reused on linux for instance (and let's not even talk about windows..).
I still believe that the simplest approach here would be to allocate memory for this structure on the stack, side-stepping the whole "how to allocate memory in a target-independent manner" discussion completely.

================
Comment at: lldb/source/Breakpoint/BreakpointInjectedSite.cpp:421
+    case DW_OP_addr: {
+      int64_t operand = op.getRawOperand(0);
+      expr += "   src_addr = " + std::to_string(operand) +
----------------
Note that this will generally not be the correct address for PIC (so, pretty much everything nowadays). You also need consider the actual address that the object file got loaded at, not just the thing that's written in the file itself.

================
Comment at: lldb/source/Plugins/ABI/SysV-x86_64/ABISysV_x86_64.h:12-58
+#define X64_JMP_OPCODE    (0xE9)
+#define X64_JMP_SIZE      (5)
+#define X64_CALL_OPCODE   (0xE8)
+#define X64_CALL_SIZE     (5)
+
+#define X64_PUSH_OPCODE   (0x50)
+#define X64_POP_OPCODE    (0x58)
----------------
Using defines for constants is very c-like (and they definitely shouldn't come before the #includes).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66249/new/

https://reviews.llvm.org/D66249