[PATCH] D23933: [XRay] ARM 32-bit no-Thumb support in compiler-rt

Renato Golin via llvm-commits llvm-commits at lists.llvm.org
Tue Aug 30 14:37:27 PDT 2016


rengolin added a comment.

Thanks for the changes, some more comments...


================
Comment at: lib/xray/xray_arm.cc:39
@@ +38,3 @@
+}
+
+// Writes the following instructions:
----------------
This is a new file, it should use LLVM's policy.

================
Comment at: lib/xray/xray_arm.cc:51
@@ +50,3 @@
+  return Address + 1;
+}
+
----------------
Of course. Ignore me.

Though, this is the same as the one below, and you could merge them both by passing the register name and ORRing [reg << 12] with the instruction, and making sure reg < 15.

================
Comment at: lib/xray/xray_arm.cc:111
@@ +110,3 @@
+        uint32_t(PatchOpcodes::PO_B20), std::memory_order_release);
+  }
+  return true;
----------------
> Is there any evidence that ARM may fetch instructions out of order? If so, how to prevent this?

I'm not sure what you mean. Many Cortex-AR cores are OOO. That's their design, you can't change that. Or maybe you mean "out of order amongst threads", which is not what I'm talking about.

Since this is in C++, so I'm guessing the compiler will "do the right thing" (tm) with regards to memory barriers, and the core being OOO makes no difference here.

Probably just a nomenclature clash around "OOO" between ourselves... :)

================
Comment at: lib/xray/xray_inmemory_log.cc:188
@@ +187,3 @@
+#elif defined(__arm__)
+    CPUFrequency = NanosecondsPerSecond;
+#else
----------------
I still find this confusing... Is this 10^9 just a normalising factor, to get compatible numbers? If anything, this line needs a serious comment explaining why this is what it is.

Also, clock_gettime() will return a system wide, sequential and consistent number, while RDTSCP will return a counter that is internal to each CPU (and will be different across CPUs), thus prone to problems while context-switching.

Regardless, if you want CPU frequency, you can do exactly what you've done to x86.

================
Comment at: lib/xray/xray_trampoline_arm.S:7
@@ +6,3 @@
+    @ Word-aligned function entry point
+    .p2align 2
+    @ Let C/C++ see the symbol
----------------
Right, so it's not C/C++, it's AAPCS (the ARM Procedure Call Standard).

As long as you're not passing NEON vectors as arguments, Q registers are not used (see arm_neon.h), and d0-d7 should take care of all VFP registers.

================
Comment at: lib/xray/xray_trampoline_arm.S:34
@@ +33,3 @@
+    VPOP {d0-d7}
+    POP {r1-r3,pc}
+
----------------
A8.8.132 POP (ARM):

    "ARM deprecates the use of this instruction with both the LR and the PC in the list."

================
Comment at: lib/xray/xray_trampoline_arm.S:40
@@ +39,3 @@
+	.global __xray_FunctionExit
+	@ Assume that d1-d7 are not used for the return value.
+    @ Assume that "q" part of the floating-point registers is not used for the
----------------
Same again, if you're not using NEON vectors, this is fine.


https://reviews.llvm.org/D23933





More information about the llvm-commits mailing list