[1/5 PATCH/RFC PPC64] add power8 keyword target to llvm
Hal Finkel
hfinkel at anl.gov
Tue Jun 24 15:58:42 PDT 2014
Will,
First, please send patches to llvm-commits (or cfe-commits) as appropriate, not to the dev lists. Second, please don't commit a copy of the P7 itinerary for the P8, just reference the P7 itinerary for now in the P8 ProcessorModel definition. When you actually have a P8 model, then you can commit that along with the ProcessorModel change.
Please update PPCHazardRecognizers.cpp, PPCInstrInfo.cpp, PPCSubtarget.cpp to make handling of DIR_PWR8 the same as DIR_PWR7 for now. Otherwise, LGTM.
-Hal
----- Original Message -----
> From: "Will Schmidt" <will_schmidt at vnet.ibm.com>
> To: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>, "clang-dev Developers" <cfe-dev at cs.uiuc.edu>
> Cc: "Will Schmidt" <willschm at us.ibm.com>, "Ulrich Weigand" <ulrich.weigand at de.ibm.com>, "Hal Finkel"
> <hfinkel at anl.gov>, "William J. Schmidt" <wschmidt at linux.vnet.ibm.com>
> Sent: Tuesday, June 24, 2014 4:43:04 PM
> Subject: [1/5 PATCH/RFC PPC64] add power8 keyword target to llvm
>
> Add pwr8/keyword, and initial P8 tablegen descriptor table.
>
>
>
> diff --git a/lib/Target/PowerPC/PPC.td b/lib/Target/PowerPC/PPC.td
> index bd58539..6badc2f 100644
> --- a/lib/Target/PowerPC/PPC.td
> +++ b/lib/Target/PowerPC/PPC.td
> @@ -46,6 +46,7 @@ def DirectivePwr5x: SubtargetFeature<"",
> "DarwinDirective", "PPC::DIR_PWR5X", ""
> def DirectivePwr6: SubtargetFeature<"", "DarwinDirective",
> "PPC::DIR_PWR6", "">;
> def DirectivePwr6x: SubtargetFeature<"", "DarwinDirective",
> "PPC::DIR_PWR6X", "">;
> def DirectivePwr7: SubtargetFeature<"", "DarwinDirective",
> "PPC::DIR_PWR7", "">;
> +def DirectivePwr8: SubtargetFeature<"", "DarwinDirective",
> "PPC::DIR_PWR8", "">;
>
> def Feature64Bit : SubtargetFeature<"64bit","Has64BitSupport",
> "true",
> "Enable 64-bit
> instructions">;
> @@ -285,6 +286,15 @@ def : ProcessorModel<"pwr7", P7Model,
> FeaturePOPCNTD, FeatureLDBRX,
> Feature64Bit /*, Feature64BitRegs */,
> DeprecatedMFTB, DeprecatedDST]>;
> +def : ProcessorModel<"pwr8", P8Model,
> + [DirectivePwr8, FeatureAltivec,
> + FeatureMFOCRF, FeatureFCPSGN, FeatureFSqrt,
> FeatureFRE,
> + FeatureFRES, FeatureFRSQRTE, FeatureFRSQRTES,
> + FeatureRecipPrec, FeatureSTFIWX, FeatureLFIWAX,
> + FeatureFPRND, FeatureFPCVT, FeatureISEL,
> + FeaturePOPCNTD, FeatureLDBRX,
> + Feature64Bit /*, Feature64BitRegs */,
> + DeprecatedMFTB, DeprecatedDST]>;
> def : Processor<"ppc", G3Itineraries, [Directive32]>;
> def : ProcessorModel<"ppc64", G5Model,
> [Directive64, FeatureAltivec,
> diff --git a/lib/Target/PowerPC/PPCSchedule.td
> b/lib/Target/PowerPC/PPCSchedule.td
> index 1221d41..a5cc4e7 100644
> --- a/lib/Target/PowerPC/PPCSchedule.td
> +++ b/lib/Target/PowerPC/PPCSchedule.td
> @@ -118,6 +118,7 @@ include "PPCScheduleG4.td"
> include "PPCScheduleG4Plus.td"
> include "PPCScheduleG5.td"
> include "PPCScheduleP7.td"
> +include "PPCScheduleP8.td"
> include "PPCScheduleA2.td"
> include "PPCScheduleE500mc.td"
> include "PPCScheduleE5500.td"
> diff --git a/lib/Target/PowerPC/PPCScheduleP8.td
> b/lib/Target/PowerPC/PPCScheduleP8.td
> new file mode 100644
> index 0000000..c4b918f
> --- /dev/null
> +++ b/lib/Target/PowerPC/PPCScheduleP8.td
> @@ -0,0 +1,389 @@
> +//===-- PPCScheduleP8.td - PPC P8 Scheduling Definitions ---*-
> tablegen -*-===//
> +//
> +// The LLVM Compiler Infrastructure
> +//
> +// This file is distributed under the University of Illinois Open
> Source
> +// License. See LICENSE.TXT for details.
> +//
> +//===----------------------------------------------------------------------===//
> +//
> +// This file defines the itinerary class data for the POWER7
> processor.
> +//
> +//===----------------------------------------------------------------------===//
> +
> +// XXX FIXME.
> +// this is a blind copy of P7 Schedule and s/P7/P8/g . Details
> within will need to be updated with the P8 specifics.
> +
> +
> +// Primary reference:
> +// IBM POWER7 multicore server processor
> +// B. Sinharoy, et al.
> +// IBM J. Res. & Dev. (55) 3. May/June 2011.
> +
> +// Scheduling for the P8 involves tracking two types of resources:
> +// 1. The dispatch bundle slots
> +// 2. The functional unit resources
> +
> +// Dispatch units:
> +def P8_DU1 : FuncUnit;
> +def P8_DU2 : FuncUnit;
> +def P8_DU3 : FuncUnit;
> +def P8_DU4 : FuncUnit;
> +def P8_DU5 : FuncUnit;
> +def P8_DU6 : FuncUnit;
> +
> +def P8_LS1 : FuncUnit; // Load/Store pipeline 1
> +def P8_LS2 : FuncUnit; // Load/Store pipeline 2
> +
> +def P8_FX1 : FuncUnit; // FX pipeline 1
> +def P8_FX2 : FuncUnit; // FX pipeline 2
> +
> +// VS pipeline 1 (vector integer ops. always here)
> +def P8_VS1 : FuncUnit; // VS pipeline 1
> +// VS pipeline 2 (128-bit stores and perms. here)
> +def P8_VS2 : FuncUnit; // VS pipeline 2
> +
> +def P8_CRU : FuncUnit; // CR unit (CR logicals and
> move-from-SPRs)
> +def P8_BRU : FuncUnit; // BR unit
> +
> +// Notes:
> +// Each LSU pipeline can also execute FX add and logical
> instructions.
> +// Each LSU pipeline can complete a load or store in one cycle.
> +//
> +// Each store is broken into two parts, AGEN goes to the LSU while a
> +// "data steering" op. goes to the FXU or VSU.
> +//
> +// FX loads have a two cycle load-to-use latency (so one "bubble"
> cycle).
> +// VSU loads have a three cycle load-to-use latency (so two "bubble"
> cycle).
> +//
> +// Frequent FX ops. take only one cycle and results can be used
> again in the
> +// next cycle (there is a self-bypass). Getting results from the
> other FX
> +// pipeline takes an additional cycle.
> +//
> +// The VSU XS is similar to the POWER6, but with a pipeline length
> of 2 cycles
> +// (instead of 3 cycles on the POWER6). VSU XS handles vector
> FX-style ops.
> +// Dispatch of an instruction to VS1 that uses four single prec.
> inputs
> +// (either to a float or XC op). prevents dispatch in that cycle to
> VS2 of any
> +// floating point instruction.
> +//
> +// The VSU PM is similar to the POWER6, but with a pipeline length
> of 3 cycles
> +// (instead of 4 cycles on the POWER6). vsel is handled by the PM
> pipeline
> +// (unlike on the POWER6).
> +//
> +// FMA from the VSUs can forward results in 6 cycles. VS1 XS and
> vector FP
> +// share the same write-back, and have a 5-cycle latency difference,
> so the
> +// IFU/IDU will not dispatch an XS instructon 5 cycles after a
> vector FP
> +// op. has been dispatched to VS1.
> +//
> +// Three cycles after an L1 cache hit, a dependent VSU instruction
> can issue.
> +//
> +// Instruction dispatch groups have (at most) four non-branch
> instructions, and
> +// two branches. Unlike on the POWER4/5, a branch does not
> automatically
> +// end the dispatch group, but a second branch must be the last in
> the group.
> +
> +def P8Itineraries : ProcessorItineraries<
> + [P8_DU1, P8_DU2, P8_DU3, P8_DU4, P8_DU5, P8_DU6,
> + P8_LS1, P8_LS2, P8_FX1, P8_FX2, P8_VS1, P8_VS2, P8_CRU, P8_BRU],
> [], [
> + InstrItinData<IIC_IntSimple , [InstrStage<1, [P8_DU1, P8_DU2,
> + P8_DU3, P8_DU4],
> 0>,
> + InstrStage<1, [P8_FX1, P8_FX2,
> + P8_LS1, P8_LS2]>],
> + [1, 1, 1]>,
> + InstrItinData<IIC_IntGeneral , [InstrStage<1, [P8_DU1, P8_DU2,
> + P8_DU3, P8_DU4],
> 0>,
> + InstrStage<1, [P8_FX1, P8_FX2]>],
> + [1, 1, 1]>,
> + InstrItinData<IIC_IntCompare , [InstrStage<1, [P8_DU1, P8_DU2,
> + P8_DU3, P8_DU4],
> 0>,
> + InstrStage<1, [P8_FX1, P8_FX2]>],
> + [1, 1, 1]>,
> + // FIXME: Add record-form itinerary data.
> + InstrItinData<IIC_IntDivW , [InstrStage<1, [P8_DU1], 0>,
> + InstrStage<1, [P8_DU2], 0>,
> + InstrStage<36, [P8_FX1,
> P8_FX2]>],
> + [36, 1, 1]>,
> + InstrItinData<IIC_IntDivD , [InstrStage<1, [P8_DU1], 0>,
> + InstrStage<1, [P8_DU2], 0>,
> + InstrStage<68, [P8_FX1,
> P8_FX2]>],
> + [68, 1, 1]>,
> + InstrItinData<IIC_IntMulHW , [InstrStage<1, [P8_DU1, P8_DU2,
> + P8_DU3, P8_DU4],
> 0>,
> + InstrStage<1, [P8_FX1, P8_FX2]>],
> + [4, 1, 1]>,
> + InstrItinData<IIC_IntMulHWU , [InstrStage<1, [P8_DU1, P8_DU2,
> + P8_DU3, P8_DU4],
> 0>,
> + InstrStage<1, [P8_FX1, P8_FX2]>],
> + [4, 1, 1]>,
> + InstrItinData<IIC_IntMulLI , [InstrStage<1, [P8_DU1, P8_DU2,
> + P8_DU3, P8_DU4],
> 0>,
> + InstrStage<1, [P8_FX1, P8_FX2]>],
> + [4, 1, 1]>,
> + InstrItinData<IIC_IntRotate , [InstrStage<1, [P8_DU1, P8_DU2,
> + P8_DU3, P8_DU4],
> 0>,
> + InstrStage<1, [P8_FX1, P8_FX2]>],
> + [1, 1, 1]>,
> + InstrItinData<IIC_IntRotateD , [InstrStage<1, [P8_DU1, P8_DU2,
> + P8_DU3, P8_DU4],
> 0>,
> + InstrStage<1, [P8_FX1, P8_FX2]>],
> + [1, 1, 1]>,
> + InstrItinData<IIC_IntShift , [InstrStage<1, [P8_DU1, P8_DU2,
> + P8_DU3, P8_DU4],
> 0>,
> + InstrStage<1, [P8_FX1, P8_FX2]>],
> + [1, 1, 1]>,
> + InstrItinData<IIC_IntTrapW , [InstrStage<1, [P8_DU1, P8_DU2,
> + P8_DU3, P8_DU4],
> 0>,
> + InstrStage<1, [P8_FX1, P8_FX2]>],
> + [1, 1]>,
> + InstrItinData<IIC_IntTrapD , [InstrStage<1, [P8_DU1, P8_DU2,
> + P8_DU3, P8_DU4],
> 0>,
> + InstrStage<1, [P8_FX1, P8_FX2]>],
> + [1, 1]>,
> + InstrItinData<IIC_BrB , [InstrStage<1, [P8_DU5, P8_DU6],
> 0>,
> + InstrStage<1, [P8_BRU]>],
> + [3, 1, 1]>,
> + InstrItinData<IIC_BrCR , [InstrStage<1, [P8_DU1], 0>,
> + InstrStage<1, [P8_CRU]>],
> + [3, 1, 1]>,
> + InstrItinData<IIC_BrMCR , [InstrStage<1, [P8_DU5, P8_DU6],
> 0>,
> + InstrStage<1, [P8_BRU]>],
> + [3, 1, 1]>,
> + InstrItinData<IIC_BrMCRX , [InstrStage<1, [P8_DU5, P8_DU6],
> 0>,
> + InstrStage<1, [P8_BRU]>],
> + [3, 1, 1]>,
> + InstrItinData<IIC_LdStLoad , [InstrStage<1, [P8_DU1, P8_DU2,
> + P8_DU3, P8_DU4],
> 0>,
> + InstrStage<1, [P8_LS1, P8_LS2]>],
> + [2, 1, 1]>,
> + InstrItinData<IIC_LdStLoadUpd , [InstrStage<1, [P8_DU1], 0>,
> + InstrStage<1, [P8_DU2], 0>,
> + InstrStage<1, [P8_LS1, P8_LS2],
> 0>,
> + InstrStage<1, [P8_FX1, P8_FX2]>],
> + [2, 2, 1, 1]>,
> + InstrItinData<IIC_LdStLoadUpdX, [InstrStage<1, [P8_DU1], 0>,
> + InstrStage<1, [P8_DU2], 0>,
> + InstrStage<1, [P8_DU3], 0>,
> + InstrStage<1, [P8_DU4], 0>,
> + InstrStage<1, [P8_FX1, P8_FX2]>,
> + InstrStage<1, [P8_LS1, P8_LS2],
> 0>,
> + InstrStage<1, [P8_FX1, P8_FX2]>],
> + [3, 3, 1, 1]>,
> + InstrItinData<IIC_LdStLD , [InstrStage<1, [P8_DU1, P8_DU2,
> + P8_DU3, P8_DU4],
> 0>,
> + InstrStage<1, [P8_LS1, P8_LS2]>],
> + [2, 1, 1]>,
> + InstrItinData<IIC_LdStLDU , [InstrStage<1, [P8_DU1], 0>,
> + InstrStage<1, [P8_DU2], 0>,
> + InstrStage<1, [P8_LS1, P8_LS2],
> 0>,
> + InstrStage<1, [P8_FX1, P8_FX2]>],
> + [2, 2, 1, 1]>,
> + InstrItinData<IIC_LdStLDUX , [InstrStage<1, [P8_DU1], 0>,
> + InstrStage<1, [P8_DU2], 0>,
> + InstrStage<1, [P8_DU3], 0>,
> + InstrStage<1, [P8_DU4], 0>,
> + InstrStage<1, [P8_FX1, P8_FX2]>,
> + InstrStage<1, [P8_LS1, P8_LS2],
> 0>,
> + InstrStage<1, [P8_FX1, P8_FX2]>],
> + [3, 3, 1, 1]>,
> + InstrItinData<IIC_LdStLFD , [InstrStage<1, [P8_DU1, P8_DU2,
> + P8_DU3, P8_DU4],
> 0>,
> + InstrStage<1, [P8_LS1, P8_LS2]>],
> + [3, 1, 1]>,
> + InstrItinData<IIC_LdStLVecX , [InstrStage<1, [P8_DU1, P8_DU2,
> + P8_DU3, P8_DU4],
> 0>,
> + InstrStage<1, [P8_LS1, P8_LS2]>],
> + [3, 1, 1]>,
> + InstrItinData<IIC_LdStLFDU , [InstrStage<1, [P8_DU1], 0>,
> + InstrStage<1, [P8_DU2], 0>,
> + InstrStage<1, [P8_LS1, P8_LS2],
> 0>,
> + InstrStage<1, [P8_FX1, P8_FX2]>],
> + [3, 3, 1, 1]>,
> + InstrItinData<IIC_LdStLFDUX , [InstrStage<1, [P8_DU1], 0>,
> + InstrStage<1, [P8_DU2], 0>,
> + InstrStage<1, [P8_LS1, P8_LS2],
> 0>,
> + InstrStage<1, [P8_FX1, P8_FX2]>],
> + [3, 3, 1, 1]>,
> + InstrItinData<IIC_LdStLHA , [InstrStage<1, [P8_DU1], 0>,
> + InstrStage<1, [P8_DU2], 0>,
> + InstrStage<1, [P8_LS1, P8_LS2]>,
> + InstrStage<1, [P8_FX1, P8_FX2]>],
> + [3, 1, 1]>,
> + InstrItinData<IIC_LdStLHAU , [InstrStage<1, [P8_DU1], 0>,
> + InstrStage<1, [P8_DU2], 0>,
> + InstrStage<1, [P8_LS1, P8_LS2],
> 0>,
> + InstrStage<1, [P8_FX1, P8_FX2]>,
> + InstrStage<1, [P8_FX1, P8_FX2]>],
> + [4, 4, 1, 1]>,
> + InstrItinData<IIC_LdStLHAUX , [InstrStage<1, [P8_DU1], 0>,
> + InstrStage<1, [P8_DU2], 0>,
> + InstrStage<1, [P8_DU3], 0>,
> + InstrStage<1, [P8_DU4], 0>,
> + InstrStage<1, [P8_FX1, P8_FX2]>,
> + InstrStage<1, [P8_LS1, P8_LS2],
> 0>,
> + InstrStage<1, [P8_FX1, P8_FX2]>,
> + InstrStage<1, [P8_FX1, P8_FX2]>],
> + [4, 4, 1, 1]>,
> + InstrItinData<IIC_LdStLWA , [InstrStage<1, [P8_DU1], 0>,
> + InstrStage<1, [P8_DU2], 0>,
> + InstrStage<1, [P8_LS1, P8_LS2]>,
> + InstrStage<1, [P8_FX1, P8_FX2]>],
> + [3, 1, 1]>,
> + InstrItinData<IIC_LdStLWARX, [InstrStage<1, [P8_DU1], 0>,
> + InstrStage<1, [P8_DU2], 0>,
> + InstrStage<1, [P8_DU3], 0>,
> + InstrStage<1, [P8_DU4], 0>,
> + InstrStage<1, [P8_LS1, P8_LS2]>],
> + [3, 1, 1]>,
> + InstrItinData<IIC_LdStLDARX, [InstrStage<1, [P8_DU1], 0>,
> + InstrStage<1, [P8_DU2], 0>,
> + InstrStage<1, [P8_DU3], 0>,
> + InstrStage<1, [P8_DU4], 0>,
> + InstrStage<1, [P8_LS1, P8_LS2]>],
> + [3, 1, 1]>,
> + InstrItinData<IIC_LdStLMW , [InstrStage<1, [P8_DU1, P8_DU2,
> + P8_DU3, P8_DU4],
> 0>,
> + InstrStage<1, [P8_LS1, P8_LS2]>],
> + [2, 1, 1]>,
> + InstrItinData<IIC_LdStStore , [InstrStage<1, [P8_DU1, P8_DU2,
> + P8_DU3, P8_DU4],
> 0>,
> + InstrStage<1, [P8_LS1, P8_LS2],
> 0>,
> + InstrStage<1, [P8_FX1, P8_FX2]>],
> + [1, 1, 1]>,
> + InstrItinData<IIC_LdStSTD , [InstrStage<1, [P8_DU1, P8_DU2,
> + P8_DU3, P8_DU4],
> 0>,
> + InstrStage<1, [P8_LS1, P8_LS2],
> 0>,
> + InstrStage<1, [P8_FX1, P8_FX2]>],
> + [1, 1, 1]>,
> + InstrItinData<IIC_LdStSTDU , [InstrStage<1, [P8_DU1], 0>,
> + InstrStage<1, [P8_DU2], 0>,
> + InstrStage<1, [P8_LS1, P8_LS2],
> 0>,
> + InstrStage<1, [P8_FX1, P8_FX2]>,
> + InstrStage<1, [P8_FX1, P8_FX2]>],
> + [2, 1, 1, 1]>,
> + InstrItinData<IIC_LdStSTDUX , [InstrStage<1, [P8_DU1], 0>,
> + InstrStage<1, [P8_DU2], 0>,
> + InstrStage<1, [P8_DU3], 0>,
> + InstrStage<1, [P8_DU4], 0>,
> + InstrStage<1, [P8_LS1, P8_LS2],
> 0>,
> + InstrStage<1, [P8_FX1, P8_FX2]>,
> + InstrStage<1, [P8_FX1, P8_FX2]>],
> + [2, 1, 1, 1]>,
> + InstrItinData<IIC_LdStSTFD , [InstrStage<1, [P8_DU1, P8_DU2,
> + P8_DU3, P8_DU4],
> 0>,
> + InstrStage<1, [P8_LS1, P8_LS2],
> 0>,
> + InstrStage<1, [P8_VS1, P8_VS2]>],
> + [1, 1, 1]>,
> + InstrItinData<IIC_LdStSTFDU , [InstrStage<1, [P8_DU1], 0>,
> + InstrStage<1, [P8_DU2], 0>,
> + InstrStage<1, [P8_LS1, P8_LS2],
> 0>,
> + InstrStage<1, [P8_FX1, P8_FX2],
> 0>,
> + InstrStage<1, [P8_VS1, P8_VS2]>],
> + [2, 1, 1, 1]>,
> + InstrItinData<IIC_LdStSTVEBX , [InstrStage<1, [P8_DU1, P8_DU2,
> + P8_DU3, P8_DU4],
> 0>,
> + InstrStage<1, [P8_LS1, P8_LS2],
> 0>,
> + InstrStage<1, [P8_VS2]>],
> + [1, 1, 1]>,
> + InstrItinData<IIC_LdStSTDCX , [InstrStage<1, [P8_DU1], 0>,
> + InstrStage<1, [P8_DU2], 0>,
> + InstrStage<1, [P8_DU3], 0>,
> + InstrStage<1, [P8_DU4], 0>,
> + InstrStage<1, [P8_LS1, P8_LS2]>],
> + [1, 1, 1]>,
> + InstrItinData<IIC_LdStSTWCX , [InstrStage<1, [P8_DU1], 0>,
> + InstrStage<1, [P8_DU2], 0>,
> + InstrStage<1, [P8_DU3], 0>,
> + InstrStage<1, [P8_DU4], 0>,
> + InstrStage<1, [P8_LS1, P8_LS2]>],
> + [1, 1, 1]>,
> + InstrItinData<IIC_BrMCRX , [InstrStage<1, [P8_DU1], 0>,
> + InstrStage<1, [P8_DU2], 0>,
> + InstrStage<1, [P8_DU3], 0>,
> + InstrStage<1, [P8_DU4], 0>,
> + InstrStage<1, [P8_CRU]>,
> + InstrStage<1, [P8_FX1, P8_FX2]>],
> + [3, 1]>, // mtcr
> + InstrItinData<IIC_SprMFCR , [InstrStage<1, [P8_DU1], 0>,
> + InstrStage<1, [P8_CRU]>],
> + [6, 1]>,
> + InstrItinData<IIC_SprMFCRF , [InstrStage<1, [P8_DU1], 0>,
> + InstrStage<1, [P8_CRU]>],
> + [3, 1]>,
> + InstrItinData<IIC_SprMTSPR , [InstrStage<1, [P8_DU1], 0>,
> + InstrStage<1, [P8_FX1]>],
> + [4, 1]>, // mtctr
> + InstrItinData<IIC_FPGeneral , [InstrStage<1, [P8_DU1, P8_DU2,
> + P8_DU3, P8_DU4],
> 0>,
> + InstrStage<1, [P8_VS1, P8_VS2]>],
> + [5, 1, 1]>,
> + InstrItinData<IIC_FPCompare , [InstrStage<1, [P8_DU1, P8_DU2,
> + P8_DU3, P8_DU4],
> 0>,
> + InstrStage<1, [P8_VS1, P8_VS2]>],
> + [8, 1, 1]>,
> + InstrItinData<IIC_FPDivD , [InstrStage<1, [P8_DU1, P8_DU2,
> + P8_DU3, P8_DU4],
> 0>,
> + InstrStage<1, [P8_VS1, P8_VS2]>],
> + [33, 1, 1]>,
> + InstrItinData<IIC_FPDivS , [InstrStage<1, [P8_DU1, P8_DU2,
> + P8_DU3, P8_DU4],
> 0>,
> + InstrStage<1, [P8_VS1, P8_VS2]>],
> + [27, 1, 1]>,
> + InstrItinData<IIC_FPSqrtD , [InstrStage<1, [P8_DU1, P8_DU2,
> + P8_DU3, P8_DU4],
> 0>,
> + InstrStage<1, [P8_VS1, P8_VS2]>],
> + [44, 1, 1]>,
> + InstrItinData<IIC_FPSqrtS , [InstrStage<1, [P8_DU1, P8_DU2,
> + P8_DU3, P8_DU4],
> 0>,
> + InstrStage<1, [P8_VS1, P8_VS2]>],
> + [32, 1, 1]>,
> + InstrItinData<IIC_FPFused , [InstrStage<1, [P8_DU1, P8_DU2,
> + P8_DU3, P8_DU4],
> 0>,
> + InstrStage<1, [P8_VS1, P8_VS2]>],
> + [5, 1, 1, 1]>,
> + InstrItinData<IIC_FPRes , [InstrStage<1, [P8_DU1, P8_DU2,
> + P8_DU3, P8_DU4],
> 0>,
> + InstrStage<1, [P8_VS1, P8_VS2]>],
> + [5, 1, 1]>,
> + InstrItinData<IIC_VecGeneral , [InstrStage<1, [P8_DU1], 0>,
> + InstrStage<1, [P8_VS1]>],
> + [2, 1, 1]>,
> + InstrItinData<IIC_VecVSL , [InstrStage<1, [P8_DU1], 0>,
> + InstrStage<1, [P8_VS1]>],
> + [2, 1, 1]>,
> + InstrItinData<IIC_VecVSR , [InstrStage<1, [P8_DU1], 0>,
> + InstrStage<1, [P8_VS1]>],
> + [2, 1, 1]>,
> + InstrItinData<IIC_VecFP , [InstrStage<1, [P8_DU1], 0>,
> + InstrStage<1, [P8_VS1, P8_VS2]>],
> + [6, 1, 1]>,
> + InstrItinData<IIC_VecFPCompare, [InstrStage<1, [P8_DU1], 0>,
> + InstrStage<1, [P8_VS1, P8_VS2]>],
> + [6, 1, 1]>,
> + InstrItinData<IIC_VecFPRound , [InstrStage<1, [P8_DU1], 0>,
> + InstrStage<1, [P8_VS1, P8_VS2]>],
> + [6, 1, 1]>,
> + InstrItinData<IIC_VecComplex , [InstrStage<1, [P8_DU1], 0>,
> + InstrStage<1, [P8_VS1]>],
> + [7, 1, 1]>,
> + InstrItinData<IIC_VecPerm , [InstrStage<1, [P8_DU1, P8_DU2],
> 0>,
> + InstrStage<1, [P8_VS2]>],
> + [3, 1, 1]>
> +]>;
> +
> +//
> ===---------------------------------------------------------------------===//
> +// P8 machine model for scheduling and other instruction cost
> heuristics.
> +
> +def P8Model : SchedMachineModel {
> + let IssueWidth = 6; // 4 (non-branch) instructions are dispatched
> per cycle.
> + // Note that the dispatch bundle size is 6
> (including
> + // branches), but the total internal issue
> bandwidth per
> + // cycle (from all queues) is 8.
> +
> + let MinLatency = 0; // Out-of-order dispatch.
> + let LoadLatency = 3; // Optimistic load latency assuming bypass.
> + // This is overriden by OperandCycles if the
> + // Itineraries are queried instead.
> + let MispredictPenalty = 16;
> +
> + let Itineraries = P8Itineraries;
> +}
> +
> diff --git a/lib/Target/PowerPC/PPCSubtarget.h
> b/lib/Target/PowerPC/PPCSubtarget.h
> index 8aafa99..a3a7480 100644
> --- a/lib/Target/PowerPC/PPCSubtarget.h
> +++ b/lib/Target/PowerPC/PPCSubtarget.h
> @@ -56,6 +56,7 @@ namespace PPC {
> DIR_PWR6,
> DIR_PWR6X,
> DIR_PWR7,
> + DIR_PWR8,
> DIR_64
> };
> }
>
>
>
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
More information about the llvm-commits
mailing list