[Lldb-commits] [PATCH] D24610: LLDB Arm Watchpoints: Use single hardware watchpoint slot to watch multiple bytes where possible

Mon Sep 19 13:04:04 PDT 2016

omjavaid added a comment.

Answers to comments. I will upload a updated patch after corrections and updates.


================
Comment at: packages/Python/lldbsuite/test/functionalities/watchpoint/multi_watchpoint_slots/main.c:23
@@ +22,3 @@
+    {
+        printf("About to write byteArray[%d] ...\n", i); // About to write byteArray
+
----------------
labath wrote:
> zturner wrote:
> > What's up with all the double spaced source code?  Is this intentional?
> Indeed. The spaces seem superfluous.
Agreed. Will be removed in next iteration.

================
Comment at: source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp:513-521
@@ -513,1 +512,11 @@
+
+  // Find out how many bytes we need to watch after 4-byte alignment boundary.
+  uint8_t watch_size = (addr & 0x03) + size;
+
+  // We cannot watch zero or more than 4 bytes after 4-byte alignment boundary.
+  if (size == 0 || watch_size > 4)
+    return LLDB_INVALID_INDEX32;
+
+  // Strip away last two bits of address for byte/half-word/word selection.
+  addr &= ~((lldb::addr_t)3);
 
----------------
labath wrote:
> zturner wrote:
> > This block of code is a bit confusing to me.  Is this equivalent to:
> > 
> > ```
> > lldb::addr_t start = llvm::alignDown(addr, 4);
> > lldb::addr_t end = addr + size;
> > if (start == end || (end-start)>4)
> >   return LLDB_INVALID_INDEX32;
> > ```
> I am not sure this is much clearer, especially, as we will later need a separate varaible for `end-start` anyway.
> 
> +1 for `llvm::alignDown` though.
There is significant performance difference when we choose addr = addr & (~0x03); over llvm::alignDown

It will eventually effect responsiveness if we keep increasing code size like this. 

Even if we use -Os then alignDown is squeezed down to 8-9 instructions. I havnt tried clang though.

Instructions needed for addr = addr & (~0x03):
  4008bd:	48 8b 45 e8          	mov    -0x18(%rbp),%rax
  4008c1:	48 83 e0 fc          	and    $0xfffffffffffffffc,%rax
  4008c5:	48 89 45 e8          	mov    %rax,-0x18(%rbp)

Call penalty for llvm::alignDown
400918:	ba 00 00 00 00       	mov    $0x0,%edx
  40091d:	48 89 ce             	mov    %rcx,%rsi
  400920:	48 89 c7             	mov    %rax,%rdi
  400923:	e8 ae 00 00 00       	callq  4009d6 <_Z9alignDownmmm>
  400928:	49 89 c4             	mov    %rax,%r12
  40092b:	48 8b 5d d8          	mov    -0x28(%rbp),%rbx

Disassembly for llvm::alignDown
00000000004009d6 <_Z9alignDownmmm>:
  4009d6:	55                   	push   %rbp
  4009d7:	48 89 e5             	mov    %rsp,%rbp
  4009da:	48 89 7d f8          	mov    %rdi,-0x8(%rbp)
  4009de:	48 89 75 f0          	mov    %rsi,-0x10(%rbp)
  4009e2:	48 89 55 e8          	mov    %rdx,-0x18(%rbp)
  4009e6:	48 8b 45 e8          	mov    -0x18(%rbp),%rax
  4009ea:	ba 00 00 00 00       	mov    $0x0,%edx
  4009ef:	48 f7 75 f0          	divq   -0x10(%rbp)
  4009f3:	48 89 55 e8          	mov    %rdx,-0x18(%rbp)
  4009f7:	48 8b 45 f8          	mov    -0x8(%rbp),%rax
  4009fb:	48 2b 45 e8          	sub    -0x18(%rbp),%rax
  4009ff:	ba 00 00 00 00       	mov    $0x0,%edx
  400a04:	48 f7 75 f0          	divq   -0x10(%rbp)
  400a08:	48 0f af 45 f0       	imul   -0x10(%rbp),%rax
  400a0d:	48 89 c2             	mov    %rax,%rdx
  400a10:	48 8b 45 e8          	mov    -0x18(%rbp),%rax
  400a14:	48 01 d0             	add    %rdx,%rax
  400a17:	5d                   	pop    %rbp
  400a18:	c3                   	retq   
  400a19:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)

Number of instructions generated for alignDown with gcc -Os

  400892:	48 8b 6c 24 08       	mov    0x8(%rsp),%rbp
  400897:	48 8b 4c 24 10       	mov    0x10(%rsp),%rcx
  40089c:	31 d2                	xor    %edx,%edx
  40089e:	be c2 0a 40 00       	mov    $0x400ac2,%esi
  4008a3:	bf a0 11 60 00       	mov    $0x6011a0,%edi
  4008a8:	48 89 e8             	mov    %rbp,%rax
  4008ab:	48 f7 f1             	div    %rcx
  4008ae:	48 0f af c1          	imul   %rcx,%rax
  4008b2:	48 89 c3             	mov    %rax,%rbx

================
Comment at: source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp:552
@@ -559,1 +551,3 @@
 
+  uint8_t current_watch_size, new_watch_size;
+  // Calculate overall size width to be watched by current hardware watchpoint slot.
----------------
labath wrote:
> Looks much better. Any reason for not using `NextPowerOf2` ? Among other things, it is self-documenting, so you do not need the comment above that.
so llvm::NextPowerOf2 doesnt serve the intended behaviour.

llvm::NextPowerOf2 returns 2 for 1, 4 for 2 or 3, and 8 for 4.

We just want to make sure that our new_size is a power of 2 if its already not which will only be the case if size turns out to be 3.


================
Comment at: source/Plugins/Process/Linux/NativeThreadLinux.cpp:202
@@ +201,3 @@
+  // Invalidate watchpoint index map to re-sync watchpoint registers.
+  m_watchpoint_index_map.clear();
+
----------------
labath wrote:
> If you add this, then the comment below becomes obsolete.
> 
> Seems like a pretty elegant solution to the incremental watchpoint update problem. I am wondering whether we need to do it on every resume though. I think it should be enough to do it when a watchpoint gets deleted (`NTL::RemoveWatchpoint`). Also, we should throw out the implementation of `NativeRegisterContextLinux_arm::ClearHardwareWatchpoint` -- it's no longer necessary, and it's not even correct anymore.
This can be improved for performance I intentionally didn't do it to minimize changes to the generic component. 


https://reviews.llvm.org/D24610