[flang-commits] [PATCH] D88981: [flang] Rework host runtime folding and enable REAL(2) folding with it.
Jean Perier via Phabricator via flang-commits
flang-commits at lists.llvm.org
Wed Oct 14 06:15:19 PDT 2020
jeanPerier added a comment.
> The patch compiles successfully with msvc (with a patch to trunk that I still need to upload a patch for).
Thanks for testing this @Meinersbur !
================
Comment at: flang/include/flang/Evaluate/common.h:11
#define FORTRAN_EVALUATE_COMMON_H_
#include "flang/Common/Fortran.h"
----------------
klausler wrote:
> Does removing this member from the folding context make them cheap to construct again?
Yes, FoldingContext are 100 times cheaper to construct according to my measurements. This improves fcvs `f18 -fparse-only` time by on 12% on average.
**FoldingContext ctor 100x speedup**
With f18 compiled with gcc 8.3 in release mode on an Intel Xeon Gold 6148, I measured 0.05ms per FoldingContext construction before vs 0.00005ms with this patch (average of 10000 ctor calls in one run. I reproduced runs 10times and got stable results). Measurement were done by instrumenting the code (https://github.com/jeanPerier/llvm-project/commit/f511284b54805aa314c1316f9143d0d0cbaa522d).
Given FoldingContext are constructed for every function call check when an explicit interface that can translate in x4 speed-up one `time f18 -fparse-only` on carefully designed tests like:
```
real, parameter :: x = 0.5
! Each following line semantic analysis end-up in 3 FoldingContext ctor call
real, parameter :: y1 = acos(x)
real, parameter :: y2 = acos(x)
! ... repeated 9997 times
real, parameter :: y10000 = acos(x)
end
```
I measured 2s before vs 0.5s with this patch (`time f18 -fparse-only` real time).
**Host folding 1.2x slowdown**
However, there is a 20% time penalty with this patch per fold with host runtime (most likely due to the added encapsulation/decapsulation of Scalar to/from Expr<SomeExpr> in the folder). I measured the time spent in Evaluate/fold-real.cpp `FoldIntrinsicFunction` on the test file above. We spent 1.3usec per fold before vs 1.6usec with this patch (average of the 10000 folds, repeated 10 times). Given a for this is at the usec level, it is negligible on scalar fold since we create 3 FoldingContext per expressions. For array expressions, that can lead to overall slowdown in the compilation (that will never be bigger than 20%). For instance I could measure a 1% overall slow-down in a program folding `acos( a_10000_element_array)` (93ms before vs 94 now).
**Conclusion: 12% overall parsing+semantics speed-up on real code**
Regarding fcvs `time f18 -fparse-only fm*.f` real time went from 4.3s to 3.8s (ten run average). So this has a visible impact on real code.
Since scalar folding is much more widespread than huge array folding, the patch seems a win to me.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D88981/new/
https://reviews.llvm.org/D88981
More information about the flang-commits
mailing list