[Lldb-commits] [PATCH] D24610: LLDB Arm Watchpoints: Use single hardware watchpoint slot to watch multiple bytes where possible
Muhammad Omair Javaid via lldb-commits
lldb-commits at lists.llvm.org
Mon Sep 19 13:04:04 PDT 2016
omjavaid added a comment.
Answers to comments. I will upload a updated patch after corrections and updates.
================
Comment at: packages/Python/lldbsuite/test/functionalities/watchpoint/multi_watchpoint_slots/main.c:23
@@ +22,3 @@
+ {
+ printf("About to write byteArray[%d] ...\n", i); // About to write byteArray
+
----------------
labath wrote:
> zturner wrote:
> > What's up with all the double spaced source code? Is this intentional?
> Indeed. The spaces seem superfluous.
Agreed. Will be removed in next iteration.
================
Comment at: source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp:513-521
@@ -513,1 +512,11 @@
+
+ // Find out how many bytes we need to watch after 4-byte alignment boundary.
+ uint8_t watch_size = (addr & 0x03) + size;
+
+ // We cannot watch zero or more than 4 bytes after 4-byte alignment boundary.
+ if (size == 0 || watch_size > 4)
+ return LLDB_INVALID_INDEX32;
+
+ // Strip away last two bits of address for byte/half-word/word selection.
+ addr &= ~((lldb::addr_t)3);
----------------
labath wrote:
> zturner wrote:
> > This block of code is a bit confusing to me. Is this equivalent to:
> >
> > ```
> > lldb::addr_t start = llvm::alignDown(addr, 4);
> > lldb::addr_t end = addr + size;
> > if (start == end || (end-start)>4)
> > return LLDB_INVALID_INDEX32;
> > ```
> I am not sure this is much clearer, especially, as we will later need a separate varaible for `end-start` anyway.
>
> +1 for `llvm::alignDown` though.
There is significant performance difference when we choose addr = addr & (~0x03); over llvm::alignDown
It will eventually effect responsiveness if we keep increasing code size like this.
Even if we use -Os then alignDown is squeezed down to 8-9 instructions. I havnt tried clang though.
Instructions needed for addr = addr & (~0x03):
4008bd: 48 8b 45 e8 mov -0x18(%rbp),%rax
4008c1: 48 83 e0 fc and $0xfffffffffffffffc,%rax
4008c5: 48 89 45 e8 mov %rax,-0x18(%rbp)
Call penalty for llvm::alignDown
400918: ba 00 00 00 00 mov $0x0,%edx
40091d: 48 89 ce mov %rcx,%rsi
400920: 48 89 c7 mov %rax,%rdi
400923: e8 ae 00 00 00 callq 4009d6 <_Z9alignDownmmm>
400928: 49 89 c4 mov %rax,%r12
40092b: 48 8b 5d d8 mov -0x28(%rbp),%rbx
Disassembly for llvm::alignDown
00000000004009d6 <_Z9alignDownmmm>:
4009d6: 55 push %rbp
4009d7: 48 89 e5 mov %rsp,%rbp
4009da: 48 89 7d f8 mov %rdi,-0x8(%rbp)
4009de: 48 89 75 f0 mov %rsi,-0x10(%rbp)
4009e2: 48 89 55 e8 mov %rdx,-0x18(%rbp)
4009e6: 48 8b 45 e8 mov -0x18(%rbp),%rax
4009ea: ba 00 00 00 00 mov $0x0,%edx
4009ef: 48 f7 75 f0 divq -0x10(%rbp)
4009f3: 48 89 55 e8 mov %rdx,-0x18(%rbp)
4009f7: 48 8b 45 f8 mov -0x8(%rbp),%rax
4009fb: 48 2b 45 e8 sub -0x18(%rbp),%rax
4009ff: ba 00 00 00 00 mov $0x0,%edx
400a04: 48 f7 75 f0 divq -0x10(%rbp)
400a08: 48 0f af 45 f0 imul -0x10(%rbp),%rax
400a0d: 48 89 c2 mov %rax,%rdx
400a10: 48 8b 45 e8 mov -0x18(%rbp),%rax
400a14: 48 01 d0 add %rdx,%rax
400a17: 5d pop %rbp
400a18: c3 retq
400a19: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
Number of instructions generated for alignDown with gcc -Os
400892: 48 8b 6c 24 08 mov 0x8(%rsp),%rbp
400897: 48 8b 4c 24 10 mov 0x10(%rsp),%rcx
40089c: 31 d2 xor %edx,%edx
40089e: be c2 0a 40 00 mov $0x400ac2,%esi
4008a3: bf a0 11 60 00 mov $0x6011a0,%edi
4008a8: 48 89 e8 mov %rbp,%rax
4008ab: 48 f7 f1 div %rcx
4008ae: 48 0f af c1 imul %rcx,%rax
4008b2: 48 89 c3 mov %rax,%rbx
================
Comment at: source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp:552
@@ -559,1 +551,3 @@
+ uint8_t current_watch_size, new_watch_size;
+ // Calculate overall size width to be watched by current hardware watchpoint slot.
----------------
labath wrote:
> Looks much better. Any reason for not using `NextPowerOf2` ? Among other things, it is self-documenting, so you do not need the comment above that.
so llvm::NextPowerOf2 doesnt serve the intended behaviour.
llvm::NextPowerOf2 returns 2 for 1, 4 for 2 or 3, and 8 for 4.
We just want to make sure that our new_size is a power of 2 if its already not which will only be the case if size turns out to be 3.
================
Comment at: source/Plugins/Process/Linux/NativeThreadLinux.cpp:202
@@ +201,3 @@
+ // Invalidate watchpoint index map to re-sync watchpoint registers.
+ m_watchpoint_index_map.clear();
+
----------------
labath wrote:
> If you add this, then the comment below becomes obsolete.
>
> Seems like a pretty elegant solution to the incremental watchpoint update problem. I am wondering whether we need to do it on every resume though. I think it should be enough to do it when a watchpoint gets deleted (`NTL::RemoveWatchpoint`). Also, we should throw out the implementation of `NativeRegisterContextLinux_arm::ClearHardwareWatchpoint` -- it's no longer necessary, and it's not even correct anymore.
This can be improved for performance I intentionally didn't do it to minimize changes to the generic component.
https://reviews.llvm.org/D24610
More information about the lldb-commits
mailing list