[PATCH] D129464: [Clang][CodeGen] Set FP options of builder at entry to compound statement

Fri Jul 15 01:04:56 PDT 2022

sepavloff added inline comments.

================
Comment at: clang/test/CodeGen/pragma-fenv_access.cpp:35
+}
+
+
----------------
aaron.ballman wrote:
> There are some extra test cases I'd like to see coverage for because there are some interesting edge cases to consider.
> ```
> template <typename Ty>
> float func1(Ty) {
>   float f1 = 1.0f, f2 = 3.0f;
>   return f1 + f2 * 2.0f;
> }
> 
> #pragma float_control(precise, on, push)
> template float func1<int>(int); 
> #pragma float_control(pop)
> 
> #pragma float_control(precise, on, push)
> template <typename Ty>
> float func2(Ty) {
>   float f1 = 1.0f, f2 = 3.0f;
>   return f1 + f2 * 2.0f;
> }
> #pragma float_control(pop)
> 
> template float func2<int>(int);
> 
> void bar() {
>     func1(1.1);
>     func2(1.1);
> }
> ```
> This gets especially interesting when you think about delayed template instantiation as happens by default on Windows targets. Consider this code with the *driver level* `-ffast-math` flag enabled (not the cc1 option, which is different).
> 
> I think that `func1<int>` SHOULD be precise, because the explicit instantiation is, while `func1<double>` SHOULD NOT be precise, because the definition is not. `func2<int>` SHOULD NOT be precise, because the explicit instantiation is not, while `func2<double>` SHOULD be precise,  because the definition is.
> 
> Partial specializations are a similar situation where the primary template and its related code made have different options.
> 
> WDYT?
Standard FP pragmas are defined only in C standard, so interaction of them with C++ specific features is actually implementation-defined. The cases presented in your example are reasonable solutions with one exception: IMO `func2<int>` should be precise, because its template is precise. It is equivalent to:
```
template <typename Ty>
float func2(Ty) {
#pragma float_control(precise, on)
  float f1 = 1.0f, f2 = 3.0f;
  return f1 + f2 * 2.0f;
}
``` 
so instantiation of it would produce function with precise operations.

Implementation of correct mechanism of the interaction requires substantial efforts and should be made in a separate patch, I think. In particular, we need to invent a way to associate a point of instantiation with the FPOptions in that point, so that delayed instantiation could be made with correct set of options.

In this patch the change in SemaTemplateInstantiateDecl.cpp prevents from compiler crash. Without it codegen tries  to create a call to constrained intrinsic in the function that do not have attribute StrictFP, because flag FEnvAccess is set at the end of translation unit in `pragma-fenv_access.cpp`.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D129464/new/

https://reviews.llvm.org/D129464