[libcxx-commits] [PATCH] D104987: Use GetSystemTimePreciseAsFileTime() if available
Adrian McCarthy via Phabricator via libcxx-commits
libcxx-commits at lists.llvm.org
Fri Jul 2 09:57:20 PDT 2021
amccarth added a comment.
tl;dr: I understand the desire to avoid even tiny synchronization costs when trying to read a high-precision timer, but I'm uncomfortable with the data race. I suggest initializing the static function pointer under the locking provided by the compiler.
My reasoning is below. If I got any of this wrong, I would reconsider my position.
First, we have a function-scope static `getSysTime_p` that's not explicitly initialized, so it get zero-initialized, which is safe to assume maps to a `nullptr` on any platform that runs Windows. With no explicit initialization on the definition, I don't think the compiler has to inject any synchronization to ensure the static is initialized exactly once during the first call. The zeros will be generated at compile time.
Next, with the manual initialization, there's obviously the chance that two (or more) threads could attempt it at the same time. Two threads could race on reading `getSysTime_p` and they could race again to write it. Officially, that's UB, right? I'm pretty sure that won't result in any tearing on x86 or x64, but I don't know about others. (Windows on ARM is a thing now.)
Since kernel32 is guaranteed to be (and to remain) loaded for the duration of any Windows-hosted process, the call to `GetProcAddress` should always give the same result, which mitigates the concerns expressed in the remarks for `GetProcAddress`.
Every thread that gets into the initialization block will try to write the same value, so maybe those races aren't a problem in real life? I cannot say definitively, and it would be very hard to write a test for. If you decide to rely on that, make sure to add a comment about it.
I feel it would be safer to guarantee the static function pointer is initialized exactly once.
The simplest option would be to initialize it where it's defined. I would move the code that finds the address to a separate function, say `GetGetSysTimeProcAddress`, and then simplify this code to:
static GetSysTime_t getSysTime_p = GetGetSysTimeProcAddress();
getSysTime_p(&ft);
Of course, there's a small cost on every call, since the program will have to check whether the static has been initialized, but I assume that's very small and about as optimal as possible for the platform. _Maybe_ you could do a tiny bit better manually placing your own fences, but that's outside my skill set, and it would be very difficult to test. I doubt that making the pointer an `std::atomic<T>` or using `::InterlockedXxx` functions would be more efficient than what the compiler generates. The double-checked lock pattern is also notoriously difficult to get right.
Another option would be to make the function pointer global, and initialize it once before any other threads have a chance to call. I don't know whether the library has such a hook. The cost of the lookup isn't huge, but it would be in the startup sequence. I don't have enough data to say whether that's an acceptable trade-off.
You could make the function pointer thread-local, so that every thread has there own copy that nobody races with, but I'm not sure that's any better than the other options. Imagine the pathologically case where every thread calls only once, and the program keeps spawning new threads.
Minor note: `GetModuleHandleW` might be a little faster than `GetModuleHandleA`. I believe the `-A` version will do the equivalent of a call to `MultiByteToWide`, which you could avoid by calling `-W` explicitly and adding an `L` prefix to the string literal. (That's just for the module name. The proc names are always "ANSI".)
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D104987/new/
https://reviews.llvm.org/D104987
More information about the libcxx-commits
mailing list