[PATCH] D21612: [compiler-rt] [XRay] Basic initialization and flag definition for XRay runtime

Tue Aug 2 16:56:34 PDT 2016

dberris added a comment.

Thanks Serge -- would you consider perhaps not sending comments onto this patch anymore? We can continue more of this on either one of the dependent changes or on the llvm-dev mailing list. I'd very much appreciate that.

PS. These sleds were implemented in https://reviews.llvm.org/D19904, you can see the changes in lowering at http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86MCInstLower.cpp?r1=275272&r2=275367&pathrev=275367&diff_format=h.

================
Comment at: compiler-rt/trunk/lib/xray/xray_interface.cc:131-133
@@ +130,5 @@
+      *reinterpret_cast<uint32_t *>(Sled.Address + 7) = TrampolineOffset;
+      std::atomic_store_explicit(
+          reinterpret_cast<std::atomic<uint16_t> *>(Sled.Address), MovR10Seq,
+          std::memory_order_release);
+    }
----------------
rSerge wrote:
> Can it happen that the 2 machine code bytes of "jmp +9" instruction belong to different 8-byte blocks of memory? I think that in this case the operation will not be atomic: the CPU will have to write different 8-byte blocks of memory in different cycles. Please, consider aligning "jmp +9" instruction at least at a multiple of 2 bytes.
We already align the function start sled to 2-byte alignment.

================
Comment at: compiler-rt/trunk/lib/xray/xray_interface.cc:167-169
@@ +166,5 @@
+      *reinterpret_cast<uint32_t *>(Sled.Address + 7) = TrampolineOffset;
+      std::atomic_store_explicit(
+          reinterpret_cast<std::atomic<uint16_t> *>(Sled.Address), MovR10Seq,
+          std::memory_order_release);
+    }
----------------
rSerge wrote:
> Can it happen that the 2 machine code bytes of "mov r10d, <function id>" instruction belong to different 8-byte blocks of memory? I think that in this case the operation will not be atomic: the CPU will have to write different 8-byte blocks of memory in different cycles. Please, consider aligning "RET" instruction at least at a multiple of 2 bytes. Otherwise the CPU executing the function on a different core may get the first byte of "mov r10d, <function id>" and the second byte (and consequtive bytes) of whatever was earlier behing the "RET" instruction.
This we don't do yet. It should be easy to do. I'll send a patch to address exactly this concern.

Repository:
  rL LLVM

https://reviews.llvm.org/D21612