[PATCH] D31167: Use FPContractModeKind universally

Tue Apr 11 09:24:46 PDT 2017

yaxunl added inline comments.

================
Comment at: cfe/trunk/include/clang/Basic/LangOptions.def:220
+/// \brief FP_CONTRACT mode (on/off/fast).
+ENUM_LANGOPT(DefaultFPContractMode, FPContractModeKind, 2, FPC_Off, "FP contraction type")
 LANGOPT(NoBitFieldTypeAlign , 1, 0, "bit-field type alignment")
----------------
anemet wrote:
> yaxunl wrote:
> > anemet wrote:
> > > yaxunl wrote:
> > > > This change seemed to cause some performance degradations in some OpenCL applications.
> > > > 
> > > > This option used to be on by default. Now it is off by default.
> > > > 
> > > > There are applications do separated compile/link/codegen stages. In the codegen stage, clang is invoked with .bc as input, therefore this option is off by default, whereas it was on by default before this change.
> > > > 
> > > > Is there any reason not to keep the original behavior?
> > > > 
> > > > Thanks.
> > > > This change seemed to cause some performance degradations in some OpenCL applications.
> > > > 
> > > > This option used to be on by default. Now it is off by default.
> > > 
> > > It's always been off.  It was set to fast for CUDA which should still be the case.  See the changes further down on the patch.
> > > 
> > > > 
> > > > There are applications do separated compile/link/codegen stages. In the codegen stage, clang is invoked with .bc as input, therefore this option is off by default, whereas it was on by default before this change.
> > > > 
> > > > Is there any reason not to keep the original behavior?
> > > 
> > > Sorry but I am not sure what changed, can you elaborate on what you're doing and how things used to work for you?
> > Before your change, -ffp-contract was a codegen option defined as
> > 
> > 
> > ```
> > ENUM_CODEGENOPT(FPContractMode, FPContractModeKind, 2, FPC_On)
> > ```
> > 
> > therefore the default value as on. After your change, it becomes off by default.
> No. -ffp-contract=on was a FE option and -ffp-contract=fast was a backend option.  I still don't understand your use-case.  Can you include a small testcase how this used to work before?
-ffp-contract=on used to be a codegen option before this change. In CodeGen/BackendUtil.cpp, before this change:

```
switch (CodeGenOpts.getFPContractMode()) {
  switch (LangOpts.getDefaultFPContractMode()) {
  case LangOptions::FPC_Off:
    Options.AllowFPOpFusion = llvm::FPOpFusion::Strict;
    break;
  case LangOptions::FPC_On:
    Options.AllowFPOpFusion = llvm::FPOpFusion::Standard;
    break;
  case LangOptions::FPC_Fast:
    Options.AllowFPOpFusion = llvm::FPOpFusion::Fast;
    break;
  }
```
By default, -fp-contract=on, which results in llvm::FPOpFusion::Standard in the backend. This options allows backend to translate llvm.fmuladd to FMA machine instructions.

After this change, it becomes:

```
switch (LangOpts.getDefaultFPContractMode()) {
  case LangOptions::FPC_Off:
    Options.AllowFPOpFusion = llvm::FPOpFusion::Strict;
    break;
  case LangOptions::FPC_On:
    Options.AllowFPOpFusion = llvm::FPOpFusion::Standard;
    break;
  case LangOptions::FPC_Fast:
    Options.AllowFPOpFusion = llvm::FPOpFusion::Fast;
    break;
  }
```
now by default -ffp-contract=off, which results in  llvm::FPOpFusion::Strict in the backend. This option disables backend translating llvm.fmuladd to FMA machine instructions in certain situations.

A simple lit test is as follows:

```
; RUN: %clang_cc1 -triple amdgcn -S -o - %s| FileCheck %s

define amdgpu_kernel void @f(double addrspace(1)* nocapture %out, double %a, double %b, double %c) local_unnamed_addr #0 {
entry:
  ; CHECK: v_fma_f64
  %0 = tail call double @llvm.fmuladd.f64(double %b, double %c, double %a)
  store double %0, double addrspace(1)* %out, align 8, !tbaa !6
  ret void
}

declare double @llvm.fmuladd.f64(double, double, double) #1

attributes #0 = { nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-features"="+fp64-fp16-denormals,-fp32-denormals" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { nounwind readnone }

```
The test passes before this change and fails after this change.

================
Comment at: cfe/trunk/lib/Frontend/CompilerInvocation.cpp:1638
     Opts.LaxVectorConversions = 0;
-    Opts.DefaultFPContract = 1;
+    Opts.setDefaultFPContractMode(LangOptions::FPC_On);
     Opts.NativeHalfType = 1;
----------------
yaxunl wrote:
> hfinkel wrote:
> > yaxunl wrote:
> > > hfinkel wrote:
> > > > Looks like the intent is certainly to have kept the OpenCL default the same here.
> > > Well we also use clang to compile LLVM IR to ISA. Before this change, fp-contract was on so that backend did fp contract. However, now clang does not turn on fp-contract when compiling LLVM IRs.
> > That's the equivalent of 'Fast'. Maybe this should also be set to Fast then.
> Agree. That should fix this issue.
Actually, the option `on` may be working too, which corresponds to llvm::FPOpFusion::Standard. At least it allows llvm.fmuladd to be translated to FMA machine instructions. However, `off` will disable such translation in certain cases.

Quoting TargetOptions.h:

```
    /// Fast mode - allows formation of fused FP ops whenever they're
    /// profitable.
    /// Standard mode - allow fusion only for 'blessed' FP ops. At present the
    /// only blessed op is the fmuladd intrinsic. In the future more blessed ops
    /// may be added.
    /// Strict mode - allow fusion only if/when it can be proven that the excess
    /// precision won't effect the result.

```

Repository:
  rL LLVM

https://reviews.llvm.org/D31167