[PATCH] D23933: [XRay] ARM 32-bit no-Thumb support in compiler-rt

Mon Aug 29 13:41:51 PDT 2016

rSerge added a comment.

Please, see my responses inline. I'll upload the updated patch in a few minutes.


================
Comment at: lib/sanitizer_common/scripts/gen_dynamic_list.py:54
@@ -52,3 +53,3 @@
   # On PowerPC, nm prints function descriptors from .data section.
-  if os.uname()[4] in ["powerpc", "ppc64"]:
+  if platform.uname()[4] in ["powerpc", "ppc64"]:
     func_symbols += ['D']
----------------
dberris wrote:
> Is this required to make this change work? Or should this really happen as an isolated change?
Without this change, XRay for ARM doesn't get cross-compiled from Windows to ARM-Linux .

================
Comment at: lib/xray/xray_arm.cc:28-31
@@ +27,6 @@
+
+static const uint32_t cOpcPush_r0_lr = 0xE92D4001;
+static const uint32_t cOpcBlx_ip = 0xE12FFF3C;
+static const uint32_t cOpcPop_r0_lr = 0xE8BD4001;
+static const uint32_t cOpcB20 = 0xEA000005;
+
----------------
dberris wrote:
> The Coding Standards seem to require that variables be camel case starting with a capital letter.
> 
> http://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly
Changing to an enum. Isn't it better to leave the register parameters of instructions separated by underscore, rather than making a name like `PO_PushR0Lr` ?

================
Comment at: lib/xray/xray_arm.cc:109
@@ +108,3 @@
+        reinterpret_cast<std::atomic<uint32_t> *>(FirstAddress), cOpcPush_r0_lr,
+        std::memory_order_release);
+  } else {
----------------
dberris wrote:
> On ARM, does `std::memory_order_release` turn into writes that have fences after to ensure they're visible? Or am I confusing ARM for an architecture that only has relaxed memory order semantics?
`std::memory_order_release` should do what the standard requires. on any compiler and CPU, unless there is a bug in them. Indeed, x86_64 is strongly ordered, so for the CPU `std::memory_order_relaxed` is always sufficient (but not for the compiler: it may reorder). However, ARM is weakly ordered, so at least `std::memory_order_release` is required here. From http://en.cppreference.com/w/cpp/atomic/memory_order : "All writes in the current thread are visible in other threads that acquire the same atomic variable..." . There is a problem that we cannot force the CPU on the other cores to perform an acquire operation fetching instructions. However, ARM CPU always fetches the instruction at `pc+8`, decodes the instruction at `pc+4` and executes the instruction at `pc` (program counter register). So as far as I know there is no reordering problem here. However, during unpatching we cannot fill the 6 tail instructions with NOPs (and I think, the same applies to x86/x86_64), because concurrent core may have already fetched the first instruction on the patch and therefore relies that the rest of instructions in the patch are correct.

================
Comment at: lib/xray/xray_interface.cc:30
@@ +29,3 @@
+#if defined(__x86_64__)
+  // FIXME: The actual length is 11 bytes. Why was length 12 passed to mprotect() ?
+  static const int16_t cSledLength = 12;
----------------
dberris wrote:
> Good question. I may have miscounted. We can fix that later, once this lands (or if you can change and test to make sure it doesn't break, I'm fine with it).
It may break, again, something to do with alignment, you may even need to increase mprotect length to 18 bytes. There may be a chance that when writing the last byte of 11-byte sled, the CPU may need to access a separate 64-bit word in memory. I don't know whether it will get permission denied in case only the first byte of this word is writeable and the other 7 bytes are write-protected.


https://reviews.llvm.org/D23933