[llvm-commits] [llvm] r148658 - in /llvm/trunk: lib/Target/ARM/ARM.td lib/Target/ARM/ARMAsmPrinter.cpp lib/Target/ARM/ARMInstrInfo.td lib/Target/ARM/ARMInstrNEON.td lib/Target/ARM/ARMInstrVFP.td lib/Target/ARM/ARMSchedule.td lib/Target/ARM/ARMSub

Hal Finkel hfinkel at anl.gov
Sun Jan 22 23:47:22 PST 2012


On Sun, 2012-01-22 at 16:46 -0800, Eli Friedman wrote:
> On Sun, Jan 22, 2012 at 4:07 AM, Anton Korobeynikov <asl at math.spbu.ru> wrote:
> > +//===----------------------------------------------------------------------===//
> > +// Fused FP Multiply-Accumulate Operations.
> > +//
> > +def VFMAD : ADbI<0b11101, 0b10, 0, 0,
> > +                 (outs DPR:$Dd), (ins DPR:$Ddin, DPR:$Dn, DPR:$Dm),
> > +                 IIC_fpFMAC64, "vfma", ".f64\t$Dd, $Dn, $Dm",
> > +                 [(set DPR:$Dd, (fadd_mlx (fmul_su DPR:$Dn, DPR:$Dm),
> > +                                          (f64 DPR:$Ddin)))]>,
> > +              RegConstraint<"$Ddin = $Dd">,
> > +              Requires<[HasVFP4]>;
> > +
> > +def VFMAS : ASbIn<0b11101, 0b10, 0, 0,
> > +                  (outs SPR:$Sd), (ins SPR:$Sdin, SPR:$Sn, SPR:$Sm),
> > +                  IIC_fpFMAC32, "vfma", ".f32\t$Sd, $Sn, $Sm",
> > +                  [(set SPR:$Sd, (fadd_mlx (fmul_su SPR:$Sn, SPR:$Sm),
> > +                                           SPR:$Sdin))]>,
> > +              RegConstraint<"$Sdin = $Sd">,
> > +              Requires<[HasVFP4,DontUseNEONForFP]> {
> > +  // Some single precision VFP instructions may be executed on both NEON and
> > +  // VFP pipelines.
> > +}
> > +
> > +def : Pat<(fadd_mlx DPR:$dstin, (fmul_su DPR:$a, (f64 DPR:$b))),
> > +          (VFMAD DPR:$dstin, DPR:$a, DPR:$b)>,
> > +          Requires<[HasVFP4]>;
> > +def : Pat<(fadd_mlx SPR:$dstin, (fmul_su SPR:$a, SPR:$b)),
> > +          (VFMAS SPR:$dstin, SPR:$a, SPR:$b)>,
> > +          Requires<[HasVFP4,DontUseNEONForFP]>;
> > +
> > +def VFMSD : ADbI<0b11101, 0b10, 1, 0,
> > +                 (outs DPR:$Dd), (ins DPR:$Ddin, DPR:$Dn, DPR:$Dm),
> > +                 IIC_fpFMAC64, "vfms", ".f64\t$Dd, $Dn, $Dm",
> > +                 [(set DPR:$Dd, (fadd_mlx (fneg (fmul_su DPR:$Dn,DPR:$Dm)),
> > +                                          (f64 DPR:$Ddin)))]>,
> > +              RegConstraint<"$Ddin = $Dd">,
> > +              Requires<[HasVFP4]>;
> > +
> > +def VFMSS : ASbIn<0b11101, 0b10, 1, 0,
> > +                  (outs SPR:$Sd), (ins SPR:$Sdin, SPR:$Sn, SPR:$Sm),
> > +                  IIC_fpFMAC32, "vfms", ".f32\t$Sd, $Sn, $Sm",
> > +                  [(set SPR:$Sd, (fadd_mlx (fneg (fmul_su SPR:$Sn, SPR:$Sm)),
> > +                                           SPR:$Sdin))]>,
> > +              RegConstraint<"$Sdin = $Sd">,
> > +              Requires<[HasVFP4,DontUseNEONForFP]> {
> > +  // Some single precision VFP instructions may be executed on both NEON and
> > +  // VFP pipelines.
> > +}
> > +
> > +def : Pat<(fsub_mlx DPR:$dstin, (fmul_su DPR:$a, (f64 DPR:$b))),
> > +          (VFMSD DPR:$dstin, DPR:$a, DPR:$b)>,
> > +          Requires<[HasVFP4]>;
> > +def : Pat<(fsub_mlx SPR:$dstin, (fmul_su SPR:$a, SPR:$b)),
> > +          (VFMSS SPR:$dstin, SPR:$a, SPR:$b)>,
> > +          Requires<[HasVFP4,DontUseNEONForFP]>;
> > +
> > +def VFNMAD : ADbI<0b11101, 0b01, 1, 0,
> > +                  (outs DPR:$Dd), (ins DPR:$Ddin, DPR:$Dn, DPR:$Dm),
> > +                  IIC_fpFMAC64, "vfnma", ".f64\t$Dd, $Dn, $Dm",
> > +                  [(set DPR:$Dd,(fsub_mlx (fneg (fmul_su DPR:$Dn,DPR:$Dm)),
> > +                                          (f64 DPR:$Ddin)))]>,
> > +                RegConstraint<"$Ddin = $Dd">,
> > +                Requires<[HasVFP4]>;
> > +
> > +def VFNMAS : ASbI<0b11101, 0b01, 1, 0,
> > +                  (outs SPR:$Sd), (ins SPR:$Sdin, SPR:$Sn, SPR:$Sm),
> > +                  IIC_fpFMAC32, "vfnma", ".f32\t$Sd, $Sn, $Sm",
> > +                  [(set SPR:$Sd, (fsub_mlx (fneg (fmul_su SPR:$Sn, SPR:$Sm)),
> > +                                           SPR:$Sdin))]>,
> > +                RegConstraint<"$Sdin = $Sd">,
> > +                Requires<[HasVFP4,DontUseNEONForFP]> {
> > +  // Some single precision VFP instructions may be executed on both NEON and
> > +  // VFP pipelines.
> > +}
> > +
> > +def : Pat<(fsub_mlx (fneg (fmul_su DPR:$a, (f64 DPR:$b))), DPR:$dstin),
> > +          (VFNMAD DPR:$dstin, DPR:$a, DPR:$b)>,
> > +          Requires<[HasVFP4]>;
> > +def : Pat<(fsub_mlx (fneg (fmul_su SPR:$a, SPR:$b)), SPR:$dstin),
> > +          (VFNMAS SPR:$dstin, SPR:$a, SPR:$b)>,
> > +          Requires<[HasVFP4,DontUseNEONForFP]>;
> > +
> > +def VFNMSD : ADbI<0b11101, 0b01, 0, 0,
> > +                  (outs DPR:$Dd), (ins DPR:$Ddin, DPR:$Dn, DPR:$Dm),
> > +                  IIC_fpFMAC64, "vfnms", ".f64\t$Dd, $Dn, $Dm",
> > +                  [(set DPR:$Dd, (fsub_mlx (fmul_su DPR:$Dn, DPR:$Dm),
> > +                                           (f64 DPR:$Ddin)))]>,
> > +               RegConstraint<"$Ddin = $Dd">,
> > +               Requires<[HasVFP4]>;
> > +
> > +def VFNMSS : ASbI<0b11101, 0b01, 0, 0,
> > +                  (outs SPR:$Sd), (ins SPR:$Sdin, SPR:$Sn, SPR:$Sm),
> > +                  IIC_fpFMAC32, "vfnms", ".f32\t$Sd, $Sn, $Sm",
> > +             [(set SPR:$Sd, (fsub_mlx (fmul_su SPR:$Sn, SPR:$Sm), SPR:$Sdin))]>,
> > +                         RegConstraint<"$Sdin = $Sd">,
> > +                  Requires<[HasVFP4,DontUseNEONForFP]> {
> > +  // Some single precision VFP instructions may be executed on both NEON and
> > +  // VFP pipelines.
> > +}
> > +
> > +def : Pat<(fsub_mlx (fmul_su DPR:$a, (f64 DPR:$b)), DPR:$dstin),
> > +          (VFNMSD DPR:$dstin, DPR:$a, DPR:$b)>,
> > +          Requires<[HasVFP4]>;
> > +def : Pat<(fsub_mlx (fmul_su SPR:$a, SPR:$b), SPR:$dstin),
> > +          (VFNMSS SPR:$dstin, SPR:$a, SPR:$b)>,
> > +          Requires<[HasVFP4,DontUseNEONForFP]>;
> 
> I'm a bit concerned about these patterns: a multiply followed by an
> add is not, strictly speaking, the same thing as a fused multiply-add.
>  We have an FMA intrinsic (http://llvm.org/docs/LangRef.html#int_fma);
> that should map onto this instruction, and we should only transform an
> unfused multiply+add in fast-math mode.

The PowerPC backend has patterns like this (for fmadd and friends), and
they are enabled whenever the TargetOptions flag NoExcessFPPrecision is
disabled (which is the default). I think that this behavior is
reasonable.

 -Hal

> 
> -Eli
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory




More information about the llvm-commits mailing list