[PATCH] D21612: [compiler-rt] [XRay] Basic initialization and flag definition for XRay runtime

Tue Aug 2 13:44:48 PDT 2016

rSerge added a comment.

Because "jmp +9" and "ret" instructions (as I understood) are not 2-byte aligned, I doublt atomicity of replacing them with 2-byte instructions of the runtime patches. Could you take a look?

================
Comment at: compiler-rt/trunk/lib/xray/xray_interface.cc:131-133
@@ +130,5 @@
+      *reinterpret_cast<uint32_t *>(Sled.Address + 7) = TrampolineOffset;
+      std::atomic_store_explicit(
+          reinterpret_cast<std::atomic<uint16_t> *>(Sled.Address), MovR10Seq,
+          std::memory_order_release);
+    }
----------------
Can it happen that the 2 machine code bytes of "jmp +9" instruction belong to different 8-byte blocks of memory? I think that in this case the operation will not be atomic: the CPU will have to write different 8-byte blocks of memory in different cycles. Please, consider aligning "jmp +9" instruction at least at a multiple of 2 bytes.

================
Comment at: compiler-rt/trunk/lib/xray/xray_interface.cc:167-169
@@ +166,5 @@
+      *reinterpret_cast<uint32_t *>(Sled.Address + 7) = TrampolineOffset;
+      std::atomic_store_explicit(
+          reinterpret_cast<std::atomic<uint16_t> *>(Sled.Address), MovR10Seq,
+          std::memory_order_release);
+    }
----------------
Can it happen that the 2 machine code bytes of "mov r10d, <function id>" instruction belong to different 8-byte blocks of memory? I think that in this case the operation will not be atomic: the CPU will have to write different 8-byte blocks of memory in different cycles. Please, consider aligning "RET" instruction at least at a multiple of 2 bytes. Otherwise the CPU executing the function on a different core may get the first byte of "mov r10d, <function id>" and the second byte (and consequtive bytes) of whatever was earlier behing the "RET" instruction.

Repository:
  rL LLVM

https://reviews.llvm.org/D21612