[flang-commits] [PATCH] D104019: [flang] Add initial implementation for CPU_TIME

Sat Jun 19 18:19:23 PDT 2021

Meinersbur added a subscriber: naromero77.
Meinersbur added a comment.

In D104019#2826366 <https://reviews.llvm.org/D104019#2826366>, @rovka wrote:

> Hmm, that's probably better - if it never changes then we'll timeout eventually, right? I'll try to commit that today.

Correct. Change looks good, does not fail anymore.

Assuming that a look takes a minimum time to execute broke some famous program in the past. For instance, Windows 3.11 <https://www.osnews.com/story/131875/those-win9x-crashes-on-fast-machines/>, some old games <https://www.vogonswiki.com/index.php/List_of_CPU_speed_sensitive_games>, and every program compiled with Turbo Pascal <http://www.kennedysoftware.ie/patchcrt.htm>.

> But just for the record, are you sure that the loop is optimized away? Have you checked the assembly? I'm thinking there might be an issue with the resolution of the timer used on Windows - we might need a better implementation for it than the default (just like I added for POSIX systems).

It was just an assumption, I haven't checked the disassembly. Embedded (where `volatile` is important) is not really a domain for msvc so ignoring volatile looked like a reasonable explanation.

> It would be nice if our test could flag when the resolution of the timer isn't good enough on a given implementation, for some definition of "good enough". Naturally, one can expect that the definition of "good enough" will change in the future, and our test can change with it. Ideally, the test would be representative of the amount of work that we expect to be able to time. OTOH I don't know if this test is the place for that, or if we should just add some microbenchmarks to the test-suite.

Here is the `/Ox` disassembly of `LookBusy` by msvc:

  00007FF79D37F770  xor         eax,eax  
  00007FF79D37F772  nop         dword ptr [rax]  
  00007FF79D37F776  nop         word ptr [rax+rax]  
      x = i;
  00007FF79D37F780  mov         dword ptr [x (07FF79DB49F50h)],eax  
  00007FF79D37F786  inc         eax  
  00007FF79D37F788  cmp         eax,100h  
  00007FF79D37F78D  jl          LookBusy+10h (07FF79D37F780h)  

That is, msvc actually did honor the volatile. The test actually also fails when compiling with gcc and running with WSL1 (Windows Subsystem for Linux), i.e. your clock resolution theory seems correct. Which makes the test platform-dependent.

IMHO a MicroBenchmark would be a better place (e.g. llvm-test-suite already has MicroBenchmarks, @naromero77 currently works on adding Fortran tests) than a regression test error. A too granular timer resolution could also mitigated by running the microbenchmark multiple times, as GoogleBenchmark does.  If precision is important, `std::chrono::high_resolution_clock` would be the API to use, which maps to QueryPerformanceCounter <https://docs.microsoft.com/en-us/windows/win32/api/profileapi/nf-profileapi-queryperformancecounter> on Windows.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104019/new/

https://reviews.llvm.org/D104019