[flang-commits] [PATCH] D104019: [flang] Add initial implementation for CPU_TIME

Mon Jun 21 01:25:45 PDT 2021

rovka added a comment.

In D104019#2829152 <https://reviews.llvm.org/D104019#2829152>, @Meinersbur wrote:

> In D104019#2826366 <https://reviews.llvm.org/D104019#2826366>, @rovka wrote:
>
>> Hmm, that's probably better - if it never changes then we'll timeout eventually, right? I'll try to commit that today.
>
> Correct. Change looks good, does not fail anymore.
>
> Assuming that a look takes a minimum time to execute broke some famous program in the past. For instance, Windows 3.11 <https://www.osnews.com/story/131875/those-win9x-crashes-on-fast-machines/>, some old games <https://www.vogonswiki.com/index.php/List_of_CPU_speed_sensitive_games>, and every program compiled with Turbo Pascal <http://www.kennedysoftware.ie/patchcrt.htm>.
>
>> But just for the record, are you sure that the loop is optimized away? Have you checked the assembly? I'm thinking there might be an issue with the resolution of the timer used on Windows - we might need a better implementation for it than the default (just like I added for POSIX systems).
>
> It was just an assumption, I haven't checked the disassembly. Embedded (where `volatile` is important) is not really a domain for msvc so ignoring volatile looked like a reasonable explanation.
>
>> It would be nice if our test could flag when the resolution of the timer isn't good enough on a given implementation, for some definition of "good enough". Naturally, one can expect that the definition of "good enough" will change in the future, and our test can change with it. Ideally, the test would be representative of the amount of work that we expect to be able to time. OTOH I don't know if this test is the place for that, or if we should just add some microbenchmarks to the test-suite.
>
> Here is the `/Ox` disassembly of `LookBusy` by msvc:
>
>   00007FF79D37F770  xor         eax,eax  
>   00007FF79D37F772  nop         dword ptr [rax]  
>   00007FF79D37F776  nop         word ptr [rax+rax]  
>       x = i;
>   00007FF79D37F780  mov         dword ptr [x (07FF79DB49F50h)],eax  
>   00007FF79D37F786  inc         eax  
>   00007FF79D37F788  cmp         eax,100h  
>   00007FF79D37F78D  jl          LookBusy+10h (07FF79D37F780h)  
>
> That is, msvc actually did honor the volatile. The test actually also fails when compiling with gcc and running with WSL1 (Windows Subsystem for Linux), i.e. your clock resolution theory seems correct. Which makes the test platform-dependent.
>
> IMHO a MicroBenchmark would be a better place (e.g. llvm-test-suite already has MicroBenchmarks, @naromero77 currently works on adding Fortran tests) than a regression test error. A too granular timer resolution could also mitigated by running the microbenchmark multiple times, as GoogleBenchmark does.  If precision is important, `std::chrono::high_resolution_clock` would be the API to use, which maps to QueryPerformanceCounter <https://docs.microsoft.com/en-us/windows/win32/api/profileapi/nf-profileapi-queryperformancecounter> on Windows.

Unfortunately we can't use std::chrono for the implementation, since that would pull in the C++ runtime libraries. As far as C++ usage goes, we're limited to  '#include <cstuff>'.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104019/new/

https://reviews.llvm.org/D104019