[llvm] r238842 - make reciprocal estimate code generation more flexible by adding command-line options (2nd try)
Akira Hatanaka
ahatanak at gmail.com
Tue Jun 2 14:40:18 PDT 2015
On Tue, Jun 2, 2015 at 1:52 PM, Sanjay Patel <spatel at rotateright.com> wrote:
> Hi Akira,
>
> To be honest, I wasn't sure how this all works. Any advice or help is much
> appreciated. :)
>
> I modeled the changes for this patch on the existing
> FPOpFusion::FPOpFusionMode code. Does that have a corresponding function
> attribute? If not, how does that work for an LTO build for example?
>
>
As far as I can tell, FPOpFusionMode doesn't have a corresponding function
attribute, so it won't work for LTO builds.
> Ideally, I think that both of these would become part of the
> fast-math-flags that I'm trying to extend to the DAG, but a function
> attribute should probably be an intermediate step?
>
>
>
Yes, ideally it should be a DAG node flag. Is the undefined behavior
discovered when r236546 was committed still blocking progress on that work?
> On Tue, Jun 2, 2015 at 2:11 PM, Akira Hatanaka <ahatanak at gmail.com> wrote:
>
>> Hi Sanjay,
>>
>> Is the plan to change option -mrecip to be a function attribute and model
>> it on a per-function basis or even at the selection-dag node level in the
>> future?
>>
>> On Tue, Jun 2, 2015 at 8:28 AM, Sanjay Patel <spatel at rotateright.com>
>> wrote:
>>
>>> Author: spatel
>>> Date: Tue Jun 2 10:28:15 2015
>>> New Revision: 238842
>>>
>>> URL: http://llvm.org/viewvc/llvm-project?rev=238842&view=rev
>>> Log:
>>> make reciprocal estimate code generation more flexible by adding
>>> command-line options (2nd try)
>>>
>>> The first try (r238051) to land this was reverted due to bot failures
>>> that were hopefully addressed by r238788.
>>>
>>> This patch adds a TargetRecip class for processing many recip codegen
>>> possibilities.
>>> The class is intended to handle both command-line options to llc as well
>>> as options passed in from a front-end such as clang with the -mrecip
>>> option.
>>>
>>> The x86 backend is updated to use the new functionality.
>>> Only -mcpu=btver2 with -ffast-math should see a functional change from
>>> this patch.
>>> All other x86 CPUs continue to *not* use reciprocal estimates by default
>>> with -ffast-math.
>>>
>>> Differential Revision: http://reviews.llvm.org/D8982
>>>
>>>
>>> Added:
>>> llvm/trunk/include/llvm/Target/TargetRecip.h
>>> llvm/trunk/lib/Target/TargetRecip.cpp
>>> Modified:
>>> llvm/trunk/include/llvm/CodeGen/CommandFlags.h
>>> llvm/trunk/include/llvm/Target/TargetOptions.h
>>> llvm/trunk/lib/Target/CMakeLists.txt
>>> llvm/trunk/lib/Target/X86/X86.td
>>> llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>>> llvm/trunk/lib/Target/X86/X86Subtarget.cpp
>>> llvm/trunk/lib/Target/X86/X86Subtarget.h
>>> llvm/trunk/lib/Target/X86/X86TargetMachine.cpp
>>> llvm/trunk/test/CodeGen/X86/recip-fastmath.ll
>>> llvm/trunk/test/CodeGen/X86/sqrt-fastmath.ll
>>>
>>> Modified: llvm/trunk/include/llvm/CodeGen/CommandFlags.h
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/CodeGen/CommandFlags.h?rev=238842&r1=238841&r2=238842&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/include/llvm/CodeGen/CommandFlags.h (original)
>>> +++ llvm/trunk/include/llvm/CodeGen/CommandFlags.h Tue Jun 2 10:28:15
>>> 2015
>>> @@ -24,6 +24,7 @@
>>> #include "llvm/Support/Host.h"
>>> #include "llvm/Target/TargetMachine.h"
>>> #include "llvm/Target/TargetOptions.h"
>>> +#include "llvm/Target/TargetRecip.h"
>>> #include <string>
>>> using namespace llvm;
>>>
>>> @@ -152,6 +153,12 @@ FuseFPOps("fp-contract",
>>> "Only fuse FP ops when the result won't be
>>> effected."),
>>> clEnumValEnd));
>>>
>>> +cl::list<std::string>
>>> +ReciprocalOps("recip",
>>> + cl::CommaSeparated,
>>> + cl::desc("Choose reciprocal operation types and parameters."),
>>> +
>>> cl::value_desc("all,none,default,divf,!vec-sqrtd,vec-divd:0,sqrt:9..."));
>>> +
>>> cl::opt<bool>
>>> DontPlaceZerosInBSS("nozero-initialized-in-bss",
>>> cl::desc("Don't place zero-initialized symbols into bss
>>> section"),
>>> @@ -230,6 +237,7 @@ static inline TargetOptions InitTargetOp
>>> TargetOptions Options;
>>> Options.LessPreciseFPMADOption = EnableFPMAD;
>>> Options.AllowFPOpFusion = FuseFPOps;
>>> + Options.Reciprocals = TargetRecip(ReciprocalOps);
>>> Options.UnsafeFPMath = EnableUnsafeFPMath;
>>> Options.NoInfsFPMath = EnableNoInfsFPMath;
>>> Options.NoNaNsFPMath = EnableNoNaNsFPMath;
>>>
>>> Modified: llvm/trunk/include/llvm/Target/TargetOptions.h
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Target/TargetOptions.h?rev=238842&r1=238841&r2=238842&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/include/llvm/Target/TargetOptions.h (original)
>>> +++ llvm/trunk/include/llvm/Target/TargetOptions.h Tue Jun 2 10:28:15
>>> 2015
>>> @@ -15,6 +15,7 @@
>>> #ifndef LLVM_TARGET_TARGETOPTIONS_H
>>> #define LLVM_TARGET_TARGETOPTIONS_H
>>>
>>> +#include "llvm/Target/TargetRecip.h"
>>> #include "llvm/MC/MCTargetOptions.h"
>>> #include <string>
>>>
>>> @@ -72,7 +73,8 @@ namespace llvm {
>>> CompressDebugSections(false), FunctionSections(false),
>>> DataSections(false), UniqueSectionNames(true),
>>> TrapUnreachable(false),
>>> TrapFuncName(), FloatABIType(FloatABI::Default),
>>> - AllowFPOpFusion(FPOpFusion::Standard),
>>> JTType(JumpTable::Single),
>>> + AllowFPOpFusion(FPOpFusion::Standard),
>>> Reciprocals(TargetRecip()),
>>> + JTType(JumpTable::Single),
>>> ThreadModel(ThreadModel::POSIX) {}
>>>
>>> /// PrintMachineCode - This flag is enabled when the
>>> -print-machineinstrs
>>> @@ -206,6 +208,9 @@ namespace llvm {
>>> /// the value of this option.
>>> FPOpFusion::FPOpFusionMode AllowFPOpFusion;
>>>
>>> + /// This class encapsulates options for reciprocal-estimate code
>>> generation.
>>> + TargetRecip Reciprocals;
>>> +
>>> /// JTType - This flag specifies the type of jump-instruction table
>>> to
>>> /// create for functions that have the jumptable attribute.
>>> JumpTable::JumpTableType JTType;
>>> @@ -240,6 +245,7 @@ inline bool operator==(const TargetOptio
>>> ARE_EQUAL(TrapFuncName) &&
>>> ARE_EQUAL(FloatABIType) &&
>>> ARE_EQUAL(AllowFPOpFusion) &&
>>> + ARE_EQUAL(Reciprocals) &&
>>> ARE_EQUAL(JTType) &&
>>> ARE_EQUAL(ThreadModel) &&
>>> ARE_EQUAL(MCOptions);
>>>
>>> Added: llvm/trunk/include/llvm/Target/TargetRecip.h
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Target/TargetRecip.h?rev=238842&view=auto
>>>
>>> ==============================================================================
>>> --- llvm/trunk/include/llvm/Target/TargetRecip.h (added)
>>> +++ llvm/trunk/include/llvm/Target/TargetRecip.h Tue Jun 2 10:28:15 2015
>>> @@ -0,0 +1,73 @@
>>> +//===--------------------- llvm/Target/TargetRecip.h ------------*- C++
>>> -*-===//
>>> +//
>>> +// The LLVM Compiler Infrastructure
>>> +//
>>> +// This file is distributed under the University of Illinois Open Source
>>> +// License. See LICENSE.TXT for details.
>>> +//
>>>
>>> +//===----------------------------------------------------------------------===//
>>> +//
>>> +// This class is used to customize machine-specific reciprocal estimate
>>> code
>>> +// generation in a target-independent way.
>>> +// If a target does not support operations in this specification, then
>>> code
>>> +// generation will default to using supported operations.
>>> +//
>>>
>>> +//===----------------------------------------------------------------------===//
>>> +
>>> +#ifndef LLVM_TARGET_TARGETRECIP_H
>>> +#define LLVM_TARGET_TARGETRECIP_H
>>> +
>>> +#include "llvm/ADT/StringRef.h"
>>> +#include <vector>
>>> +#include <string>
>>> +#include <map>
>>> +
>>> +namespace llvm {
>>> +
>>> +struct TargetRecip {
>>> +public:
>>> + TargetRecip();
>>> +
>>> + /// Initialize all or part of the operations from command-line
>>> options or
>>> + /// a front end.
>>> + TargetRecip(const std::vector<std::string> &Args);
>>> +
>>> + /// Set whether a particular reciprocal operation is enabled and how
>>> many
>>> + /// refinement steps are needed when using it. Use "all" to set
>>> enablement
>>> + /// and refinement steps for all operations.
>>> + void setDefaults(const StringRef &Key, bool Enable, unsigned
>>> RefSteps);
>>> +
>>> + /// Return true if the reciprocal operation has been enabled by
>>> default or
>>> + /// from the command-line. Return false if the operation has been
>>> disabled
>>> + /// by default or from the command-line.
>>> + bool isEnabled(const StringRef &Key) const;
>>> +
>>> + /// Return the number of iterations necessary to refine the
>>> + /// the result of a machine instruction for the given reciprocal
>>> operation.
>>> + unsigned getRefinementSteps(const StringRef &Key) const;
>>> +
>>> + bool operator==(const TargetRecip &Other) const;
>>> +
>>> +private:
>>> + enum {
>>> + Uninitialized = -1
>>> + };
>>> +
>>> + struct RecipParams {
>>> + int8_t Enabled;
>>> + int8_t RefinementSteps;
>>> +
>>> + RecipParams() : Enabled(Uninitialized),
>>> RefinementSteps(Uninitialized) {}
>>> + };
>>> +
>>> + std::map<StringRef, RecipParams> RecipMap;
>>> + typedef std::map<StringRef, RecipParams>::iterator RecipIter;
>>> + typedef std::map<StringRef, RecipParams>::const_iterator
>>> ConstRecipIter;
>>> +
>>> + bool parseGlobalParams(const std::string &Arg);
>>> + void parseIndividualParams(const std::vector<std::string> &Args);
>>> +};
>>> +
>>> +} // End llvm namespace
>>> +
>>> +#endif
>>>
>>> Modified: llvm/trunk/lib/Target/CMakeLists.txt
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/CMakeLists.txt?rev=238842&r1=238841&r2=238842&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/lib/Target/CMakeLists.txt (original)
>>> +++ llvm/trunk/lib/Target/CMakeLists.txt Tue Jun 2 10:28:15 2015
>>> @@ -6,6 +6,7 @@ add_llvm_library(LLVMTarget
>>> TargetLoweringObjectFile.cpp
>>> TargetMachine.cpp
>>> TargetMachineC.cpp
>>> + TargetRecip.cpp
>>> TargetSubtargetInfo.cpp
>>>
>>> ADDITIONAL_HEADER_DIRS
>>>
>>> Added: llvm/trunk/lib/Target/TargetRecip.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/TargetRecip.cpp?rev=238842&view=auto
>>>
>>> ==============================================================================
>>> --- llvm/trunk/lib/Target/TargetRecip.cpp (added)
>>> +++ llvm/trunk/lib/Target/TargetRecip.cpp Tue Jun 2 10:28:15 2015
>>> @@ -0,0 +1,225 @@
>>> +//===-------------------------- TargetRecip.cpp
>>> ---------------------------===//
>>> +//
>>> +// The LLVM Compiler Infrastructure
>>> +//
>>> +// This file is distributed under the University of Illinois Open Source
>>> +// License. See LICENSE.TXT for details.
>>> +//
>>>
>>> +//===----------------------------------------------------------------------===//
>>> +//
>>> +// This class is used to customize machine-specific reciprocal estimate
>>> code
>>> +// generation in a target-independent way.
>>> +// If a target does not support operations in this specification, then
>>> code
>>> +// generation will default to using supported operations.
>>> +//
>>>
>>> +//===----------------------------------------------------------------------===//
>>> +
>>> +#include "llvm/ADT/StringRef.h"
>>> +#include "llvm/ADT/STLExtras.h"
>>> +#include "llvm/Support/ErrorHandling.h"
>>> +#include "llvm/Target/TargetRecip.h"
>>> +#include <map>
>>> +
>>> +using namespace llvm;
>>> +
>>> +// These are the names of the individual reciprocal operations. These
>>> are
>>> +// the key strings for queries and command-line inputs.
>>> +// In addition, the command-line interface recognizes the global
>>> parameters
>>> +// "all", "none", and "default".
>>> +static const char *RecipOps[] = {
>>> + "divd",
>>> + "divf",
>>> + "vec-divd",
>>> + "vec-divf",
>>> + "sqrtd",
>>> + "sqrtf",
>>> + "vec-sqrtd",
>>> + "vec-sqrtf",
>>> +};
>>> +
>>> +// The uninitialized state is needed for the enabled settings and
>>> refinement
>>> +// steps because custom settings may arrive via the command-line before
>>> target
>>> +// defaults are set.
>>> +TargetRecip::TargetRecip() {
>>> + unsigned NumStrings = llvm::array_lengthof(RecipOps);
>>> + for (unsigned i = 0; i < NumStrings; ++i)
>>> + RecipMap.insert(std::make_pair(RecipOps[i], RecipParams()));
>>> +}
>>> +
>>> +static bool parseRefinementStep(const StringRef &In, size_t &Position,
>>> + uint8_t &Value) {
>>> + const char RefStepToken = ':';
>>> + Position = In.find(RefStepToken);
>>> + if (Position == StringRef::npos)
>>> + return false;
>>> +
>>> + StringRef RefStepString = In.substr(Position + 1);
>>> + // Allow exactly one numeric character for the additional refinement
>>> + // step parameter.
>>> + if (RefStepString.size() == 1) {
>>> + char RefStepChar = RefStepString[0];
>>> + if (RefStepChar >= '0' && RefStepChar <= '9') {
>>> + Value = RefStepChar - '0';
>>> + return true;
>>> + }
>>> + }
>>> + report_fatal_error("Invalid refinement step for -recip.");
>>> +}
>>> +
>>> +bool TargetRecip::parseGlobalParams(const std::string &Arg) {
>>> + StringRef ArgSub = Arg;
>>> +
>>> + // Look for an optional setting of the number of refinement steps
>>> needed
>>> + // for this type of reciprocal operation.
>>> + size_t RefPos;
>>> + uint8_t RefSteps;
>>> + StringRef RefStepString;
>>> + if (parseRefinementStep(ArgSub, RefPos, RefSteps)) {
>>> + // Split the string for further processing.
>>> + RefStepString = ArgSub.substr(RefPos + 1);
>>> + ArgSub = ArgSub.substr(0, RefPos);
>>> + }
>>> + bool Enable;
>>> + bool UseDefaults;
>>> + if (ArgSub == "all") {
>>> + UseDefaults = false;
>>> + Enable = true;
>>> + } else if (ArgSub == "none") {
>>> + UseDefaults = false;
>>> + Enable = false;
>>> + } else if (ArgSub == "default") {
>>> + UseDefaults = true;
>>> + } else {
>>> + // Any other string is invalid or an individual setting.
>>> + return false;
>>> + }
>>> +
>>> + // All enable values will be initialized to target defaults if
>>> 'default' was
>>> + // specified.
>>> + if (!UseDefaults)
>>> + for (auto &KV : RecipMap)
>>> + KV.second.Enabled = Enable;
>>> +
>>> + // Custom refinement count was specified with all, none, or default.
>>> + if (!RefStepString.empty())
>>> + for (auto &KV : RecipMap)
>>> + KV.second.RefinementSteps = RefSteps;
>>> +
>>> + return true;
>>> +}
>>> +
>>> +void TargetRecip::parseIndividualParams(const std::vector<std::string>
>>> &Args) {
>>> + static const char DisabledPrefix = '!';
>>> + unsigned NumArgs = Args.size();
>>> +
>>> + for (unsigned i = 0; i != NumArgs; ++i) {
>>> + StringRef Val = Args[i];
>>> +
>>> + bool IsDisabled = Val[0] == DisabledPrefix;
>>> + // Ignore the disablement token for string matching.
>>> + if (IsDisabled)
>>> + Val = Val.substr(1);
>>> +
>>> + size_t RefPos;
>>> + uint8_t RefSteps;
>>> + StringRef RefStepString;
>>> + if (parseRefinementStep(Val, RefPos, RefSteps)) {
>>> + // Split the string for further processing.
>>> + RefStepString = Val.substr(RefPos + 1);
>>> + Val = Val.substr(0, RefPos);
>>> + }
>>> +
>>> + RecipIter Iter = RecipMap.find(Val);
>>> + if (Iter == RecipMap.end()) {
>>> + // Try again specifying float suffix.
>>> + Iter = RecipMap.find(Val.str() + 'f');
>>> + if (Iter == RecipMap.end()) {
>>> + Iter = RecipMap.find(Val.str() + 'd');
>>> + assert(Iter == RecipMap.end() && "Float entry missing from
>>> map");
>>> + report_fatal_error("Invalid option for -recip.");
>>> + }
>>> +
>>> + // The option was specified without a float or double suffix.
>>> + if (RecipMap[Val.str() + 'd'].Enabled != Uninitialized) {
>>> + // Make sure that the double entry was not already specified.
>>> + // The float entry will be checked below.
>>> + report_fatal_error("Duplicate option for -recip.");
>>> + }
>>> + }
>>> +
>>> + if (Iter->second.Enabled != Uninitialized)
>>> + report_fatal_error("Duplicate option for -recip.");
>>> +
>>> + // Mark the matched option as found. Do not allow duplicate
>>> specifiers.
>>> + Iter->second.Enabled = !IsDisabled;
>>> + if (!RefStepString.empty())
>>> + Iter->second.RefinementSteps = RefSteps;
>>> +
>>> + // If the precision was not specified, the double entry is also
>>> initialized.
>>> + if (Val.back() != 'f' && Val.back() != 'd') {
>>> + RecipMap[Val.str() + 'd'].Enabled = !IsDisabled;
>>> + if (!RefStepString.empty())
>>> + RecipMap[Val.str() + 'd'].RefinementSteps = RefSteps;
>>> + }
>>> + }
>>> +}
>>> +
>>> +TargetRecip::TargetRecip(const std::vector<std::string> &Args) :
>>> + TargetRecip() {
>>> + unsigned NumArgs = Args.size();
>>> +
>>> + // Check if "all", "default", or "none" was specified.
>>> + if (NumArgs == 1 && parseGlobalParams(Args[0]))
>>> + return;
>>> +
>>> + parseIndividualParams(Args);
>>> +}
>>> +
>>> +bool TargetRecip::isEnabled(const StringRef &Key) const {
>>> + ConstRecipIter Iter = RecipMap.find(Key);
>>> + assert(Iter != RecipMap.end() && "Unknown name for reciprocal map");
>>> + assert(Iter->second.Enabled != Uninitialized &&
>>> + "Enablement setting was not initialized");
>>> + return Iter->second.Enabled;
>>> +}
>>> +
>>> +unsigned TargetRecip::getRefinementSteps(const StringRef &Key) const {
>>> + ConstRecipIter Iter = RecipMap.find(Key);
>>> + assert(Iter != RecipMap.end() && "Unknown name for reciprocal map");
>>> + assert(Iter->second.RefinementSteps != Uninitialized &&
>>> + "Refinement step setting was not initialized");
>>> + return Iter->second.RefinementSteps;
>>> +}
>>> +
>>> +/// Custom settings (previously initialized values) override target
>>> defaults.
>>> +void TargetRecip::setDefaults(const StringRef &Key, bool Enable,
>>> + unsigned RefSteps) {
>>> + if (Key == "all") {
>>> + for (auto &KV : RecipMap) {
>>> + RecipParams &RP = KV.second;
>>> + if (RP.Enabled == Uninitialized)
>>> + RP.Enabled = Enable;
>>> + if (RP.RefinementSteps == Uninitialized)
>>> + RP.RefinementSteps = RefSteps;
>>> + }
>>> + } else {
>>> + RecipParams &RP = RecipMap[Key];
>>> + if (RP.Enabled == Uninitialized)
>>> + RP.Enabled = Enable;
>>> + if (RP.RefinementSteps == Uninitialized)
>>> + RP.RefinementSteps = RefSteps;
>>> + }
>>> +}
>>> +
>>> +bool TargetRecip::operator==(const TargetRecip &Other) const {
>>> + for (const auto &KV : RecipMap) {
>>> + const StringRef &Op = KV.first;
>>> + const RecipParams &RP = KV.second;
>>> + const RecipParams &OtherRP = Other.RecipMap.find(Op)->second;
>>> + if (RP.RefinementSteps != OtherRP.RefinementSteps)
>>> + return false;
>>> + if (RP.Enabled != OtherRP.Enabled)
>>> + return false;
>>> + }
>>> + return true;
>>> +}
>>>
>>> Modified: llvm/trunk/lib/Target/X86/X86.td
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86.td?rev=238842&r1=238841&r2=238842&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/lib/Target/X86/X86.td (original)
>>> +++ llvm/trunk/lib/Target/X86/X86.td Tue Jun 2 10:28:15 2015
>>> @@ -188,10 +188,6 @@ def FeatureSlowLEA : SubtargetFeature<"s
>>> "LEA instruction with certain
>>> arguments is slow">;
>>> def FeatureSlowIncDec : SubtargetFeature<"slow-incdec", "SlowIncDec",
>>> "true",
>>> "INC and DEC instructions are slower
>>> than ADD and SUB">;
>>> -def FeatureUseSqrtEst : SubtargetFeature<"use-sqrt-est", "UseSqrtEst",
>>> "true",
>>> - "Use RSQRT* to optimize square root
>>> calculations">;
>>> -def FeatureUseRecipEst : SubtargetFeature<"use-recip-est",
>>> "UseReciprocalEst",
>>> - "true", "Use RCP* to optimize division
>>> calculations">;
>>> def FeatureSoftFloat
>>> : SubtargetFeature<"soft-float", "UseSoftFloat", "true",
>>> "Use software floating point features.">;
>>> @@ -444,7 +440,7 @@ def : ProcessorModel<"btver2", BtVer2Mod
>>> FeaturePRFCHW, FeatureAES, FeaturePCLMUL,
>>> FeatureBMI, FeatureF16C, FeatureMOVBE,
>>> FeatureLZCNT, FeaturePOPCNT, FeatureFastUAMem,
>>> - FeatureSlowSHLD, FeatureUseSqrtEst,
>>> FeatureUseRecipEst]>;
>>> + FeatureSlowSHLD]>;
>>>
>>> // TODO: We should probably add 'FeatureFastUAMem' to all of the AMD
>>> chips.
>>>
>>>
>>> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=238842&r1=238841&r2=238842&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
>>> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Tue Jun 2 10:28:15
>>> 2015
>>> @@ -67,12 +67,6 @@ static cl::opt<bool> ExperimentalVectorW
>>> "rather than promotion."),
>>> cl::Hidden);
>>>
>>> -static cl::opt<int> ReciprocalEstimateRefinementSteps(
>>> - "x86-recip-refinement-steps", cl::init(1),
>>> - cl::desc("Specify the number of Newton-Raphson iterations applied
>>> to the "
>>> - "result of the hardware reciprocal estimate instruction."),
>>> - cl::NotHidden);
>>> -
>>> // Forward declarations.
>>> static SDValue getMOVL(SelectionDAG &DAG, SDLoc dl, EVT VT, SDValue V1,
>>> SDValue V2);
>>> @@ -13006,29 +13000,31 @@ SDValue X86TargetLowering::getRsqrtEstim
>>> DAGCombinerInfo &DCI,
>>> unsigned &RefinementSteps,
>>> bool &UseOneConstNR) const {
>>> - // FIXME: We should use instruction latency models to calculate the
>>> cost of
>>> - // each potential sequence, but this is very hard to do reliably
>>> because
>>> - // at least Intel's Core* chips have variable timing based on the
>>> number of
>>> - // significant digits in the divisor and/or sqrt operand.
>>> - if (!Subtarget->useSqrtEst())
>>> - return SDValue();
>>> -
>>> EVT VT = Op.getValueType();
>>> + const char *RecipOp;
>>>
>>> - // SSE1 has rsqrtss and rsqrtps.
>>> + // SSE1 has rsqrtss and rsqrtps. AVX adds a 256-bit variant for
>>> rsqrtps.
>>> // TODO: Add support for AVX512 (v16f32).
>>> // It is likely not profitable to do this for f64 because a
>>> double-precision
>>> // rsqrt estimate with refinement on x86 prior to FMA requires at
>>> least 16
>>> // instructions: convert to single, rsqrtss, convert back to double,
>>> refine
>>> // (3 steps = at least 13 insts). If an 'rsqrtsd' variant was added
>>> to the ISA
>>> // along with FMA, this could be a throughput win.
>>> - if ((Subtarget->hasSSE1() && (VT == MVT::f32 || VT == MVT::v4f32)) ||
>>> - (Subtarget->hasAVX() && VT == MVT::v8f32)) {
>>> - RefinementSteps = 1;
>>> - UseOneConstNR = false;
>>> - return DCI.DAG.getNode(X86ISD::FRSQRT, SDLoc(Op), VT, Op);
>>> - }
>>> - return SDValue();
>>> + if (VT == MVT::f32 && Subtarget->hasSSE1())
>>> + RecipOp = "sqrtf";
>>> + else if ((VT == MVT::v4f32 && Subtarget->hasSSE1()) ||
>>> + (VT == MVT::v8f32 && Subtarget->hasAVX()))
>>> + RecipOp = "vec-sqrtf";
>>> + else
>>> + return SDValue();
>>> +
>>> + TargetRecip Recips = DCI.DAG.getTarget().Options.Reciprocals;
>>> + if (!Recips.isEnabled(RecipOp))
>>> + return SDValue();
>>> +
>>> + RefinementSteps = Recips.getRefinementSteps(RecipOp);
>>> + UseOneConstNR = false;
>>> + return DCI.DAG.getNode(X86ISD::FRSQRT, SDLoc(Op), VT, Op);
>>> }
>>>
>>> /// The minimum architected relative accuracy is 2^-12. We need one
>>> @@ -13036,15 +13032,9 @@ SDValue X86TargetLowering::getRsqrtEstim
>>> SDValue X86TargetLowering::getRecipEstimate(SDValue Op,
>>> DAGCombinerInfo &DCI,
>>> unsigned &RefinementSteps)
>>> const {
>>> - // FIXME: We should use instruction latency models to calculate the
>>> cost of
>>> - // each potential sequence, but this is very hard to do reliably
>>> because
>>> - // at least Intel's Core* chips have variable timing based on the
>>> number of
>>> - // significant digits in the divisor.
>>> - if (!Subtarget->useReciprocalEst())
>>> - return SDValue();
>>> -
>>> EVT VT = Op.getValueType();
>>> -
>>> + const char *RecipOp;
>>> +
>>> // SSE1 has rcpss and rcpps. AVX adds a 256-bit variant for rcpps.
>>> // TODO: Add support for AVX512 (v16f32).
>>> // It is likely not profitable to do this for f64 because a
>>> double-precision
>>> @@ -13052,12 +13042,20 @@ SDValue X86TargetLowering::getRecipEstim
>>> // 15 instructions: convert to single, rcpss, convert back to double,
>>> refine
>>> // (3 steps = 12 insts). If an 'rcpsd' variant was added to the ISA
>>> // along with FMA, this could be a throughput win.
>>> - if ((Subtarget->hasSSE1() && (VT == MVT::f32 || VT == MVT::v4f32)) ||
>>> - (Subtarget->hasAVX() && VT == MVT::v8f32)) {
>>> - RefinementSteps = ReciprocalEstimateRefinementSteps;
>>> - return DCI.DAG.getNode(X86ISD::FRCP, SDLoc(Op), VT, Op);
>>> - }
>>> - return SDValue();
>>> + if (VT == MVT::f32 && Subtarget->hasSSE1())
>>> + RecipOp = "divf";
>>> + else if ((VT == MVT::v4f32 && Subtarget->hasSSE1()) ||
>>> + (VT == MVT::v8f32 && Subtarget->hasAVX()))
>>> + RecipOp = "vec-divf";
>>> + else
>>> + return SDValue();
>>> +
>>> + TargetRecip Recips = DCI.DAG.getTarget().Options.Reciprocals;
>>> + if (!Recips.isEnabled(RecipOp))
>>> + return SDValue();
>>> +
>>> + RefinementSteps = Recips.getRefinementSteps(RecipOp);
>>> + return DCI.DAG.getNode(X86ISD::FRCP, SDLoc(Op), VT, Op);
>>> }
>>>
>>> /// If we have at least two divisions that use the same divisor,
>>> convert to
>>>
>>> Modified: llvm/trunk/lib/Target/X86/X86Subtarget.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86Subtarget.cpp?rev=238842&r1=238841&r2=238842&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/lib/Target/X86/X86Subtarget.cpp (original)
>>> +++ llvm/trunk/lib/Target/X86/X86Subtarget.cpp Tue Jun 2 10:28:15 2015
>>> @@ -273,8 +273,6 @@ void X86Subtarget::initializeEnvironment
>>> LEAUsesAG = false;
>>> SlowLEA = false;
>>> SlowIncDec = false;
>>> - UseSqrtEst = false;
>>> - UseReciprocalEst = false;
>>> stackAlignment = 4;
>>> // FIXME: this is a known good value for Yonah. How about others?
>>> MaxInlineSizeThreshold = 128;
>>>
>>> Modified: llvm/trunk/lib/Target/X86/X86Subtarget.h
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86Subtarget.h?rev=238842&r1=238841&r2=238842&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/lib/Target/X86/X86Subtarget.h (original)
>>> +++ llvm/trunk/lib/Target/X86/X86Subtarget.h Tue Jun 2 10:28:15 2015
>>> @@ -190,16 +190,6 @@ protected:
>>> /// True if INC and DEC instructions are slow when writing to flags
>>> bool SlowIncDec;
>>>
>>> - /// Use the RSQRT* instructions to optimize square root calculations.
>>> - /// For this to be profitable, the cost of FSQRT and FDIV must be
>>> - /// substantially higher than normal FP ops like FADD and FMUL.
>>> - bool UseSqrtEst;
>>> -
>>> - /// Use the RCP* instructions to optimize FP division calculations.
>>> - /// For this to be profitable, the cost of FDIV must be
>>> - /// substantially higher than normal FP ops like FADD and FMUL.
>>> - bool UseReciprocalEst;
>>> -
>>> /// Processor has AVX-512 PreFetch Instructions
>>> bool HasPFI;
>>>
>>> @@ -377,8 +367,6 @@ public:
>>> bool LEAusesAG() const { return LEAUsesAG; }
>>> bool slowLEA() const { return SlowLEA; }
>>> bool slowIncDec() const { return SlowIncDec; }
>>> - bool useSqrtEst() const { return UseSqrtEst; }
>>> - bool useReciprocalEst() const { return UseReciprocalEst; }
>>> bool hasCDI() const { return HasCDI; }
>>> bool hasPFI() const { return HasPFI; }
>>> bool hasERI() const { return HasERI; }
>>>
>>> Modified: llvm/trunk/lib/Target/X86/X86TargetMachine.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86TargetMachine.cpp?rev=238842&r1=238841&r2=238842&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/lib/Target/X86/X86TargetMachine.cpp (original)
>>> +++ llvm/trunk/lib/Target/X86/X86TargetMachine.cpp Tue Jun 2 10:28:15
>>> 2015
>>> @@ -105,6 +105,13 @@ X86TargetMachine::X86TargetMachine(const
>>> if (Subtarget.isTargetWin64())
>>> this->Options.TrapUnreachable = true;
>>>
>>> + // TODO: By default, all reciprocal estimate operations are off
>>> because
>>> + // that matches the behavior before TargetRecip was added (except for
>>> btver2
>>> + // which used subtarget features to enable this type of codegen).
>>> + // We should change this to match GCC behavior where everything but
>>> + // scalar division estimates are turned on by default with
>>> -ffast-math.
>>> + this->Options.Reciprocals.setDefaults("all", false, 1);
>>> +
>>> initAsmInfo();
>>> }
>>>
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/recip-fastmath.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/recip-fastmath.ll?rev=238842&r1=238841&r2=238842&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/recip-fastmath.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/recip-fastmath.ll Tue Jun 2 10:28:15
>>> 2015
>>> @@ -1,6 +1,6 @@
>>> ; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=sse2 | FileCheck
>>> %s
>>> -; RUN: llc < %s -mtriple=x86_64-unknown-unknown
>>> -mattr=avx,use-recip-est | FileCheck %s --check-prefix=RECIP
>>> -; RUN: llc < %s -mtriple=x86_64-unknown-unknown
>>> -mattr=avx,use-recip-est -x86-recip-refinement-steps=2 | FileCheck %s
>>> --check-prefix=REFINE
>>> +; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx
>>> -recip=divf,vec-divf | FileCheck %s --check-prefix=RECIP
>>> +; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx
>>> -recip=divf:2,vec-divf:2 | FileCheck %s --check-prefix=REFINE
>>>
>>> ; If the target's divss/divps instructions are substantially
>>> ; slower than rcpss/rcpps with a Newton-Raphson refinement,
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/sqrt-fastmath.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/sqrt-fastmath.ll?rev=238842&r1=238841&r2=238842&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/sqrt-fastmath.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/sqrt-fastmath.ll Tue Jun 2 10:28:15 2015
>>> @@ -1,5 +1,5 @@
>>> ; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=sse2 | FileCheck
>>> %s
>>> -; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx,use-sqrt-est
>>> | FileCheck %s --check-prefix=ESTIMATE
>>> +; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx
>>> -recip=sqrtf,vec-sqrtf | FileCheck %s --check-prefix=ESTIMATE
>>>
>>> declare double @__sqrt_finite(double) #0
>>> declare float @__sqrtf_finite(float) #0
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150602/c7e60530/attachment.html>
More information about the llvm-commits
mailing list