[PATCH] D31167: Use FPContractModeKind universally

Wed Apr 12 11:06:11 PDT 2017

hfinkel added inline comments.

================
Comment at: cfe/trunk/include/clang/Basic/LangOptions.def:220
+/// \brief FP_CONTRACT mode (on/off/fast).
+ENUM_LANGOPT(DefaultFPContractMode, FPContractModeKind, 2, FPC_Off, "FP contraction type")
 LANGOPT(NoBitFieldTypeAlign , 1, 0, "bit-field type alignment")
----------------
anemet wrote:
> hfinkel wrote:
> > yaxunl wrote:
> > > hfinkel wrote:
> > > > yaxunl wrote:
> > > > > anemet wrote:
> > > > > > yaxunl wrote:
> > > > > > > anemet wrote:
> > > > > > > > yaxunl wrote:
> > > > > > > > > This change seemed to cause some performance degradations in some OpenCL applications.
> > > > > > > > > 
> > > > > > > > > This option used to be on by default. Now it is off by default.
> > > > > > > > > 
> > > > > > > > > There are applications do separated compile/link/codegen stages. In the codegen stage, clang is invoked with .bc as input, therefore this option is off by default, whereas it was on by default before this change.
> > > > > > > > > 
> > > > > > > > > Is there any reason not to keep the original behavior?
> > > > > > > > > 
> > > > > > > > > Thanks.
> > > > > > > > > This change seemed to cause some performance degradations in some OpenCL applications.
> > > > > > > > > 
> > > > > > > > > This option used to be on by default. Now it is off by default.
> > > > > > > > 
> > > > > > > > It's always been off.  It was set to fast for CUDA which should still be the case.  See the changes further down on the patch.
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > There are applications do separated compile/link/codegen stages. In the codegen stage, clang is invoked with .bc as input, therefore this option is off by default, whereas it was on by default before this change.
> > > > > > > > > 
> > > > > > > > > Is there any reason not to keep the original behavior?
> > > > > > > > 
> > > > > > > > Sorry but I am not sure what changed, can you elaborate on what you're doing and how things used to work for you?
> > > > > > > Before your change, -ffp-contract was a codegen option defined as
> > > > > > > 
> > > > > > > 
> > > > > > > ```
> > > > > > > ENUM_CODEGENOPT(FPContractMode, FPContractModeKind, 2, FPC_On)
> > > > > > > ```
> > > > > > > 
> > > > > > > therefore the default value as on. After your change, it becomes off by default.
> > > > > > No. -ffp-contract=on was a FE option and -ffp-contract=fast was a backend option.  I still don't understand your use-case.  Can you include a small testcase how this used to work before?
> > > > > -ffp-contract=on used to be a codegen option before this change. In CodeGen/BackendUtil.cpp, before this change:
> > > > > 
> > > > > ```
> > > > > switch (CodeGenOpts.getFPContractMode()) {
> > > > >   switch (LangOpts.getDefaultFPContractMode()) {
> > > > >   case LangOptions::FPC_Off:
> > > > >     Options.AllowFPOpFusion = llvm::FPOpFusion::Strict;
> > > > >     break;
> > > > >   case LangOptions::FPC_On:
> > > > >     Options.AllowFPOpFusion = llvm::FPOpFusion::Standard;
> > > > >     break;
> > > > >   case LangOptions::FPC_Fast:
> > > > >     Options.AllowFPOpFusion = llvm::FPOpFusion::Fast;
> > > > >     break;
> > > > >   }
> > > > > ```
> > > > > By default, -fp-contract=on, which results in llvm::FPOpFusion::Standard in the backend. This options allows backend to translate llvm.fmuladd to FMA machine instructions.
> > > > > 
> > > > > After this change, it becomes:
> > > > > 
> > > > > ```
> > > > > switch (LangOpts.getDefaultFPContractMode()) {
> > > > >   case LangOptions::FPC_Off:
> > > > >     Options.AllowFPOpFusion = llvm::FPOpFusion::Strict;
> > > > >     break;
> > > > >   case LangOptions::FPC_On:
> > > > >     Options.AllowFPOpFusion = llvm::FPOpFusion::Standard;
> > > > >     break;
> > > > >   case LangOptions::FPC_Fast:
> > > > >     Options.AllowFPOpFusion = llvm::FPOpFusion::Fast;
> > > > >     break;
> > > > >   }
> > > > > ```
> > > > > now by default -ffp-contract=off, which results in  llvm::FPOpFusion::Strict in the backend. This option disables backend translating llvm.fmuladd to FMA machine instructions in certain situations.
> > > > > 
> > > > > A simple lit test is as follows:
> > > > > 
> > > > > 
> > > > > ```
> > > > > ; RUN: %clang_cc1 -triple amdgcn -S -o - %s| FileCheck %s
> > > > > 
> > > > > define amdgpu_kernel void @f(double addrspace(1)* nocapture %out, double %a, double %b, double %c) local_unnamed_addr #0 {
> > > > > entry:
> > > > >   ; CHECK: v_fma_f64
> > > > >   %0 = tail call double @llvm.fmuladd.f64(double %b, double %c, double %a)
> > > > >   store double %0, double addrspace(1)* %out, align 8, !tbaa !6
> > > > >   ret void
> > > > > }
> > > > > 
> > > > > declare double @llvm.fmuladd.f64(double, double, double) #1
> > > > > 
> > > > > attributes #0 = { nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-features"="+fp64-fp16-denormals,-fp32-denormals" "unsafe-fp-math"="false" "use-soft-float"="false" }
> > > > > attributes #1 = { nounwind readnone }
> > > > > 
> > > > > ```
> > > > > The test passes before this change and fails after this change.
> > > > I'm missing something here. We're calling for OpenCL:
> > > > 
> > > >   Opts.setDefaultFPContractMode(LangOptions::FPC_On);
> > > > 
> > > > which seems like it should change the result returned by getDefaultFPContractMode(). Why does it not?
> > > > 
> > > The input to this test is LLVM IR, so `Opts.setDefaultFPContractMode(LangOptions::FPC_On)` is not executed.
> > Seems like this is another aspect of the LTO problem. All of these things need to be function attributes. This is also true to make LTO work.
> > 
> @hfinkel, what is the actual point of un-fusing the intrinsic in the BE with FPOpFusion::Strict?
> 
> I don't know if targets use the target-independent intrinsic to implement builtin support but if yes this would override the user choosing of FMA by directly specifying the builtin.
> @hfinkel, what is the actual point of un-fusing the intrinsic in the BE with FPOpFusion::Strict?

I think this was added based on a different conception of how frontends would use the intrinsic: that they'd always add it where the language-semantics allow, and then the backend would either never use FMAs (strict), use FMAs where profitable (standard), etc.

> I don't know if targets use the target-independent intrinsic to implement builtin support but if yes this would override the user choosing of FMA by directly specifying the builtin.

No target should use fmuladd to implement the builtin support for a target-specific intrinsic. It should use the fma intrinsic would always maps to an fma.

Repository:
  rL LLVM

https://reviews.llvm.org/D31167