[PATCH] D23933: [XRay] ARM 32-bit no-Thumb support in compiler-rt
Renato Golin via llvm-commits
llvm-commits at lists.llvm.org
Tue Aug 30 14:37:27 PDT 2016
rengolin added a comment.
Thanks for the changes, some more comments...
================
Comment at: lib/xray/xray_arm.cc:39
@@ +38,3 @@
+}
+
+// Writes the following instructions:
----------------
This is a new file, it should use LLVM's policy.
================
Comment at: lib/xray/xray_arm.cc:51
@@ +50,3 @@
+ return Address + 1;
+}
+
----------------
Of course. Ignore me.
Though, this is the same as the one below, and you could merge them both by passing the register name and ORRing [reg << 12] with the instruction, and making sure reg < 15.
================
Comment at: lib/xray/xray_arm.cc:111
@@ +110,3 @@
+ uint32_t(PatchOpcodes::PO_B20), std::memory_order_release);
+ }
+ return true;
----------------
> Is there any evidence that ARM may fetch instructions out of order? If so, how to prevent this?
I'm not sure what you mean. Many Cortex-AR cores are OOO. That's their design, you can't change that. Or maybe you mean "out of order amongst threads", which is not what I'm talking about.
Since this is in C++, so I'm guessing the compiler will "do the right thing" (tm) with regards to memory barriers, and the core being OOO makes no difference here.
Probably just a nomenclature clash around "OOO" between ourselves... :)
================
Comment at: lib/xray/xray_inmemory_log.cc:188
@@ +187,3 @@
+#elif defined(__arm__)
+ CPUFrequency = NanosecondsPerSecond;
+#else
----------------
I still find this confusing... Is this 10^9 just a normalising factor, to get compatible numbers? If anything, this line needs a serious comment explaining why this is what it is.
Also, clock_gettime() will return a system wide, sequential and consistent number, while RDTSCP will return a counter that is internal to each CPU (and will be different across CPUs), thus prone to problems while context-switching.
Regardless, if you want CPU frequency, you can do exactly what you've done to x86.
================
Comment at: lib/xray/xray_trampoline_arm.S:7
@@ +6,3 @@
+ @ Word-aligned function entry point
+ .p2align 2
+ @ Let C/C++ see the symbol
----------------
Right, so it's not C/C++, it's AAPCS (the ARM Procedure Call Standard).
As long as you're not passing NEON vectors as arguments, Q registers are not used (see arm_neon.h), and d0-d7 should take care of all VFP registers.
================
Comment at: lib/xray/xray_trampoline_arm.S:34
@@ +33,3 @@
+ VPOP {d0-d7}
+ POP {r1-r3,pc}
+
----------------
A8.8.132 POP (ARM):
"ARM deprecates the use of this instruction with both the LR and the PC in the list."
================
Comment at: lib/xray/xray_trampoline_arm.S:40
@@ +39,3 @@
+ .global __xray_FunctionExit
+ @ Assume that d1-d7 are not used for the return value.
+ @ Assume that "q" part of the floating-point registers is not used for the
----------------
Same again, if you're not using NEON vectors, this is fine.
https://reviews.llvm.org/D23933
More information about the llvm-commits
mailing list