[flang-commits] [PATCH] D104019: [flang] Add initial implementation for CPU_TIME
Michael Kruse via Phabricator via flang-commits
flang-commits at lists.llvm.org
Sat Jun 19 18:19:23 PDT 2021
Meinersbur added a subscriber: naromero77.
Meinersbur added a comment.
In D104019#2826366 <https://reviews.llvm.org/D104019#2826366>, @rovka wrote:
> Hmm, that's probably better - if it never changes then we'll timeout eventually, right? I'll try to commit that today.
Correct. Change looks good, does not fail anymore.
Assuming that a look takes a minimum time to execute broke some famous program in the past. For instance, Windows 3.11 <https://www.osnews.com/story/131875/those-win9x-crashes-on-fast-machines/>, some old games <https://www.vogonswiki.com/index.php/List_of_CPU_speed_sensitive_games>, and every program compiled with Turbo Pascal <http://www.kennedysoftware.ie/patchcrt.htm>.
> But just for the record, are you sure that the loop is optimized away? Have you checked the assembly? I'm thinking there might be an issue with the resolution of the timer used on Windows - we might need a better implementation for it than the default (just like I added for POSIX systems).
It was just an assumption, I haven't checked the disassembly. Embedded (where `volatile` is important) is not really a domain for msvc so ignoring volatile looked like a reasonable explanation.
> It would be nice if our test could flag when the resolution of the timer isn't good enough on a given implementation, for some definition of "good enough". Naturally, one can expect that the definition of "good enough" will change in the future, and our test can change with it. Ideally, the test would be representative of the amount of work that we expect to be able to time. OTOH I don't know if this test is the place for that, or if we should just add some microbenchmarks to the test-suite.
Here is the `/Ox` disassembly of `LookBusy` by msvc:
00007FF79D37F770 xor eax,eax
00007FF79D37F772 nop dword ptr [rax]
00007FF79D37F776 nop word ptr [rax+rax]
x = i;
00007FF79D37F780 mov dword ptr [x (07FF79DB49F50h)],eax
00007FF79D37F786 inc eax
00007FF79D37F788 cmp eax,100h
00007FF79D37F78D jl LookBusy+10h (07FF79D37F780h)
That is, msvc actually did honor the volatile. The test actually also fails when compiling with gcc and running with WSL1 (Windows Subsystem for Linux), i.e. your clock resolution theory seems correct. Which makes the test platform-dependent.
IMHO a MicroBenchmark would be a better place (e.g. llvm-test-suite already has MicroBenchmarks, @naromero77 currently works on adding Fortran tests) than a regression test error. A too granular timer resolution could also mitigated by running the microbenchmark multiple times, as GoogleBenchmark does. If precision is important, `std::chrono::high_resolution_clock` would be the API to use, which maps to QueryPerformanceCounter <https://docs.microsoft.com/en-us/windows/win32/api/profileapi/nf-profileapi-queryperformancecounter> on Windows.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D104019/new/
https://reviews.llvm.org/D104019
More information about the flang-commits
mailing list