<div dir="ltr">Hi Sanjay,<div><br></div><div>Is the plan to change option -mrecip to be a function attribute and model it on a per-function basis or even at the selection-dag node level in the future?</div><div><div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Jun 2, 2015 at 8:28 AM, Sanjay Patel <span dir="ltr"><<a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Author: spatel<br>
Date: Tue Jun  2 10:28:15 2015<br>
New Revision: 238842<br>
<br>
URL: <a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_viewvc_llvm-2Dproject-3Frev-3D238842-26view-3Drev&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=mQ4LZ2PUj9hpadE3cDHZnIdEwhEBrbAstXeMaFoB9tg&m=bOzyN4yuudXuzutRueSI0iqV-7Clqt8sug00vLcP3zc&s=ZVdcO3XATsUZdu1-nnYNVD-2F43gPu1ZhwqFQE-8SiM&e=" target="_blank">http://llvm.org/viewvc/llvm-project?rev=238842&view=rev</a><br>
Log:<br>
make reciprocal estimate code generation more flexible by adding command-line options (2nd try)<br>
<br>
The first try (r238051) to land this was reverted due to bot failures<br>
that were hopefully addressed by r238788.<br>
<br>
This patch adds a TargetRecip class for processing many recip codegen possibilities.<br>
The class is intended to handle both command-line options to llc as well<br>
as options passed in from a front-end such as clang with the -mrecip option.<br>
<br>
The x86 backend is updated to use the new functionality.<br>
Only -mcpu=btver2 with -ffast-math should see a functional change from this patch.<br>
All other x86 CPUs continue to *not* use reciprocal estimates by default with -ffast-math.<br>
<br>
Differential Revision: <a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_D8982&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=mQ4LZ2PUj9hpadE3cDHZnIdEwhEBrbAstXeMaFoB9tg&m=bOzyN4yuudXuzutRueSI0iqV-7Clqt8sug00vLcP3zc&s=8Vt5srf1_f3XBDY2zYb9OfJsFP5MZJOHY_65khsV0OQ&e=" target="_blank">http://reviews.llvm.org/D8982</a><br>
<br>
<br>
Added:<br>
    llvm/trunk/include/llvm/Target/TargetRecip.h<br>
    llvm/trunk/lib/Target/TargetRecip.cpp<br>
Modified:<br>
    llvm/trunk/include/llvm/CodeGen/CommandFlags.h<br>
    llvm/trunk/include/llvm/Target/TargetOptions.h<br>
    llvm/trunk/lib/Target/CMakeLists.txt<br>
    llvm/trunk/lib/Target/X86/X86.td<br>
    llvm/trunk/lib/Target/X86/X86ISelLowering.cpp<br>
    llvm/trunk/lib/Target/X86/X86Subtarget.cpp<br>
    llvm/trunk/lib/Target/X86/X86Subtarget.h<br>
    llvm/trunk/lib/Target/X86/X86TargetMachine.cpp<br>
    llvm/trunk/test/CodeGen/X86/recip-fastmath.ll<br>
    llvm/trunk/test/CodeGen/X86/sqrt-fastmath.ll<br>
<br>
Modified: llvm/trunk/include/llvm/CodeGen/CommandFlags.h<br>
URL: <a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_viewvc_llvm-2Dproject_llvm_trunk_include_llvm_CodeGen_CommandFlags.h-3Frev-3D238842-26r1-3D238841-26r2-3D238842-26view-3Ddiff&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=mQ4LZ2PUj9hpadE3cDHZnIdEwhEBrbAstXeMaFoB9tg&m=bOzyN4yuudXuzutRueSI0iqV-7Clqt8sug00vLcP3zc&s=3xFjiKTkCjKyKjS9_yW0xzbaaYNXrFKba-nKADpHUYQ&e=" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/CodeGen/CommandFlags.h?rev=238842&r1=238841&r2=238842&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/include/llvm/CodeGen/CommandFlags.h (original)<br>
+++ llvm/trunk/include/llvm/CodeGen/CommandFlags.h Tue Jun  2 10:28:15 2015<br>
@@ -24,6 +24,7 @@<br>
 #include "llvm/Support/Host.h"<br>
 #include "llvm/Target/TargetMachine.h"<br>
 #include "llvm/Target/TargetOptions.h"<br>
+#include "llvm/Target/TargetRecip.h"<br>
 #include <string><br>
 using namespace llvm;<br>
<br>
@@ -152,6 +153,12 @@ FuseFPOps("fp-contract",<br>
                          "Only fuse FP ops when the result won't be effected."),<br>
               clEnumValEnd));<br>
<br>
+cl::list<std::string><br>
+ReciprocalOps("recip",<br>
+  cl::CommaSeparated,<br>
+  cl::desc("Choose reciprocal operation types and parameters."),<br>
+  cl::value_desc("all,none,default,divf,!vec-sqrtd,vec-divd:0,sqrt:9..."));<br>
+<br>
 cl::opt<bool><br>
 DontPlaceZerosInBSS("nozero-initialized-in-bss",<br>
               cl::desc("Don't place zero-initialized symbols into bss section"),<br>
@@ -230,6 +237,7 @@ static inline TargetOptions InitTargetOp<br>
   TargetOptions Options;<br>
   Options.LessPreciseFPMADOption = EnableFPMAD;<br>
   Options.AllowFPOpFusion = FuseFPOps;<br>
+  Options.Reciprocals = TargetRecip(ReciprocalOps);<br>
   Options.UnsafeFPMath = EnableUnsafeFPMath;<br>
   Options.NoInfsFPMath = EnableNoInfsFPMath;<br>
   Options.NoNaNsFPMath = EnableNoNaNsFPMath;<br>
<br>
Modified: llvm/trunk/include/llvm/Target/TargetOptions.h<br>
URL: <a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_viewvc_llvm-2Dproject_llvm_trunk_include_llvm_Target_TargetOptions.h-3Frev-3D238842-26r1-3D238841-26r2-3D238842-26view-3Ddiff&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=mQ4LZ2PUj9hpadE3cDHZnIdEwhEBrbAstXeMaFoB9tg&m=bOzyN4yuudXuzutRueSI0iqV-7Clqt8sug00vLcP3zc&s=gLdtS5cQCM23DbvFbvvS7z6peM9T98v-OeTfixljjhU&e=" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Target/TargetOptions.h?rev=238842&r1=238841&r2=238842&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/include/llvm/Target/TargetOptions.h (original)<br>
+++ llvm/trunk/include/llvm/Target/TargetOptions.h Tue Jun  2 10:28:15 2015<br>
@@ -15,6 +15,7 @@<br>
 #ifndef LLVM_TARGET_TARGETOPTIONS_H<br>
 #define LLVM_TARGET_TARGETOPTIONS_H<br>
<br>
+#include "llvm/Target/TargetRecip.h"<br>
 #include "llvm/MC/MCTargetOptions.h"<br>
 #include <string><br>
<br>
@@ -72,7 +73,8 @@ namespace llvm {<br>
           CompressDebugSections(false), FunctionSections(false),<br>
           DataSections(false), UniqueSectionNames(true), TrapUnreachable(false),<br>
           TrapFuncName(), FloatABIType(FloatABI::Default),<br>
-          AllowFPOpFusion(FPOpFusion::Standard), JTType(JumpTable::Single),<br>
+          AllowFPOpFusion(FPOpFusion::Standard), Reciprocals(TargetRecip()),<br>
+          JTType(JumpTable::Single),<br>
           ThreadModel(ThreadModel::POSIX) {}<br>
<br>
     /// PrintMachineCode - This flag is enabled when the -print-machineinstrs<br>
@@ -206,6 +208,9 @@ namespace llvm {<br>
     /// the value of this option.<br>
     FPOpFusion::FPOpFusionMode AllowFPOpFusion;<br>
<br>
+    /// This class encapsulates options for reciprocal-estimate code generation.<br>
+    TargetRecip Reciprocals;<br>
+<br>
     /// JTType - This flag specifies the type of jump-instruction table to<br>
     /// create for functions that have the jumptable attribute.<br>
     JumpTable::JumpTableType JTType;<br>
@@ -240,6 +245,7 @@ inline bool operator==(const TargetOptio<br>
     ARE_EQUAL(TrapFuncName) &&<br>
     ARE_EQUAL(FloatABIType) &&<br>
     ARE_EQUAL(AllowFPOpFusion) &&<br>
+    ARE_EQUAL(Reciprocals) &&<br>
     ARE_EQUAL(JTType) &&<br>
     ARE_EQUAL(ThreadModel) &&<br>
     ARE_EQUAL(MCOptions);<br>
<br>
Added: llvm/trunk/include/llvm/Target/TargetRecip.h<br>
URL: <a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_viewvc_llvm-2Dproject_llvm_trunk_include_llvm_Target_TargetRecip.h-3Frev-3D238842-26view-3Dauto&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=mQ4LZ2PUj9hpadE3cDHZnIdEwhEBrbAstXeMaFoB9tg&m=bOzyN4yuudXuzutRueSI0iqV-7Clqt8sug00vLcP3zc&s=fMZfimW-SyfuaDCcQ_GtVVVvrr1a0LlZb10DNOcbZjA&e=" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Target/TargetRecip.h?rev=238842&view=auto</a><br>
==============================================================================<br>
--- llvm/trunk/include/llvm/Target/TargetRecip.h (added)<br>
+++ llvm/trunk/include/llvm/Target/TargetRecip.h Tue Jun  2 10:28:15 2015<br>
@@ -0,0 +1,73 @@<br>
+//===--------------------- llvm/Target/TargetRecip.h ------------*- C++ -*-===//<br>
+//<br>
+//                     The LLVM Compiler Infrastructure<br>
+//<br>
+// This file is distributed under the University of Illinois Open Source<br>
+// License. See LICENSE.TXT for details.<br>
+//<br>
+//===----------------------------------------------------------------------===//<br>
+//<br>
+// This class is used to customize machine-specific reciprocal estimate code<br>
+// generation in a target-independent way.<br>
+// If a target does not support operations in this specification, then code<br>
+// generation will default to using supported operations.<br>
+//<br>
+//===----------------------------------------------------------------------===//<br>
+<br>
+#ifndef LLVM_TARGET_TARGETRECIP_H<br>
+#define LLVM_TARGET_TARGETRECIP_H<br>
+<br>
+#include "llvm/ADT/StringRef.h"<br>
+#include <vector><br>
+#include <string><br>
+#include <map><br>
+<br>
+namespace llvm {<br>
+<br>
+struct TargetRecip {<br>
+public:<br>
+  TargetRecip();<br>
+<br>
+  /// Initialize all or part of the operations from command-line options or<br>
+  /// a front end.<br>
+  TargetRecip(const std::vector<std::string> &Args);<br>
+<br>
+  /// Set whether a particular reciprocal operation is enabled and how many<br>
+  /// refinement steps are needed when using it. Use "all" to set enablement<br>
+  /// and refinement steps for all operations.<br>
+  void setDefaults(const StringRef &Key, bool Enable, unsigned RefSteps);<br>
+<br>
+  /// Return true if the reciprocal operation has been enabled by default or<br>
+  /// from the command-line. Return false if the operation has been disabled<br>
+  /// by default or from the command-line.<br>
+  bool isEnabled(const StringRef &Key) const;<br>
+<br>
+  /// Return the number of iterations necessary to refine the<br>
+  /// the result of a machine instruction for the given reciprocal operation.<br>
+  unsigned getRefinementSteps(const StringRef &Key) const;<br>
+<br>
+  bool operator==(const TargetRecip &Other) const;<br>
+<br>
+private:<br>
+  enum {<br>
+    Uninitialized = -1<br>
+  };<br>
+<br>
+  struct RecipParams {<br>
+    int8_t Enabled;<br>
+    int8_t RefinementSteps;<br>
+<br>
+    RecipParams() : Enabled(Uninitialized), RefinementSteps(Uninitialized) {}<br>
+  };<br>
+<br>
+  std::map<StringRef, RecipParams> RecipMap;<br>
+  typedef std::map<StringRef, RecipParams>::iterator RecipIter;<br>
+  typedef std::map<StringRef, RecipParams>::const_iterator ConstRecipIter;<br>
+<br>
+  bool parseGlobalParams(const std::string &Arg);<br>
+  void parseIndividualParams(const std::vector<std::string> &Args);<br>
+};<br>
+<br>
+} // End llvm namespace<br>
+<br>
+#endif<br>
<br>
Modified: llvm/trunk/lib/Target/CMakeLists.txt<br>
URL: <a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_viewvc_llvm-2Dproject_llvm_trunk_lib_Target_CMakeLists.txt-3Frev-3D238842-26r1-3D238841-26r2-3D238842-26view-3Ddiff&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=mQ4LZ2PUj9hpadE3cDHZnIdEwhEBrbAstXeMaFoB9tg&m=bOzyN4yuudXuzutRueSI0iqV-7Clqt8sug00vLcP3zc&s=3x7KL0MTHfU8bMQoJaVwE6JVKp2tOMsFcHqDtU5pg6s&e=" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/CMakeLists.txt?rev=238842&r1=238841&r2=238842&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/lib/Target/CMakeLists.txt (original)<br>
+++ llvm/trunk/lib/Target/CMakeLists.txt Tue Jun  2 10:28:15 2015<br>
@@ -6,6 +6,7 @@ add_llvm_library(LLVMTarget<br>
   TargetLoweringObjectFile.cpp<br>
   TargetMachine.cpp<br>
   TargetMachineC.cpp<br>
+  TargetRecip.cpp<br>
   TargetSubtargetInfo.cpp<br>
<br>
   ADDITIONAL_HEADER_DIRS<br>
<br>
Added: llvm/trunk/lib/Target/TargetRecip.cpp<br>
URL: <a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_viewvc_llvm-2Dproject_llvm_trunk_lib_Target_TargetRecip.cpp-3Frev-3D238842-26view-3Dauto&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=mQ4LZ2PUj9hpadE3cDHZnIdEwhEBrbAstXeMaFoB9tg&m=bOzyN4yuudXuzutRueSI0iqV-7Clqt8sug00vLcP3zc&s=l2_hr1_oauUXyg3L2JAThMLe14kHO6FxeBVG1d7vh0c&e=" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/TargetRecip.cpp?rev=238842&view=auto</a><br>
==============================================================================<br>
--- llvm/trunk/lib/Target/TargetRecip.cpp (added)<br>
+++ llvm/trunk/lib/Target/TargetRecip.cpp Tue Jun  2 10:28:15 2015<br>
@@ -0,0 +1,225 @@<br>
+//===-------------------------- TargetRecip.cpp ---------------------------===//<br>
+//<br>
+//                     The LLVM Compiler Infrastructure<br>
+//<br>
+// This file is distributed under the University of Illinois Open Source<br>
+// License. See LICENSE.TXT for details.<br>
+//<br>
+//===----------------------------------------------------------------------===//<br>
+//<br>
+// This class is used to customize machine-specific reciprocal estimate code<br>
+// generation in a target-independent way.<br>
+// If a target does not support operations in this specification, then code<br>
+// generation will default to using supported operations.<br>
+//<br>
+//===----------------------------------------------------------------------===//<br>
+<br>
+#include "llvm/ADT/StringRef.h"<br>
+#include "llvm/ADT/STLExtras.h"<br>
+#include "llvm/Support/ErrorHandling.h"<br>
+#include "llvm/Target/TargetRecip.h"<br>
+#include <map><br>
+<br>
+using namespace llvm;<br>
+<br>
+// These are the names of the individual reciprocal operations. These are<br>
+// the key strings for queries and command-line inputs.<br>
+// In addition, the command-line interface recognizes the global parameters<br>
+// "all", "none", and "default".<br>
+static const char *RecipOps[] = {<br>
+  "divd",<br>
+  "divf",<br>
+  "vec-divd",<br>
+  "vec-divf",<br>
+  "sqrtd",<br>
+  "sqrtf",<br>
+  "vec-sqrtd",<br>
+  "vec-sqrtf",<br>
+};<br>
+<br>
+// The uninitialized state is needed for the enabled settings and refinement<br>
+// steps because custom settings may arrive via the command-line before target<br>
+// defaults are set.<br>
+TargetRecip::TargetRecip() {<br>
+  unsigned NumStrings = llvm::array_lengthof(RecipOps);<br>
+  for (unsigned i = 0; i < NumStrings; ++i)<br>
+    RecipMap.insert(std::make_pair(RecipOps[i], RecipParams()));<br>
+}<br>
+<br>
+static bool parseRefinementStep(const StringRef &In, size_t &Position,<br>
+                                uint8_t &Value) {<br>
+  const char RefStepToken = ':';<br>
+  Position = In.find(RefStepToken);<br>
+  if (Position == StringRef::npos)<br>
+    return false;<br>
+<br>
+  StringRef RefStepString = In.substr(Position + 1);<br>
+  // Allow exactly one numeric character for the additional refinement<br>
+  // step parameter.<br>
+  if (RefStepString.size() == 1) {<br>
+    char RefStepChar = RefStepString[0];<br>
+    if (RefStepChar >= '0' && RefStepChar <= '9') {<br>
+      Value = RefStepChar - '0';<br>
+      return true;<br>
+    }<br>
+  }<br>
+  report_fatal_error("Invalid refinement step for -recip.");<br>
+}<br>
+<br>
+bool TargetRecip::parseGlobalParams(const std::string &Arg) {<br>
+  StringRef ArgSub = Arg;<br>
+<br>
+  // Look for an optional setting of the number of refinement steps needed<br>
+  // for this type of reciprocal operation.<br>
+  size_t RefPos;<br>
+  uint8_t RefSteps;<br>
+  StringRef RefStepString;<br>
+  if (parseRefinementStep(ArgSub, RefPos, RefSteps)) {<br>
+    // Split the string for further processing.<br>
+    RefStepString = ArgSub.substr(RefPos + 1);<br>
+    ArgSub = ArgSub.substr(0, RefPos);<br>
+  }<br>
+  bool Enable;<br>
+  bool UseDefaults;<br>
+  if (ArgSub == "all") {<br>
+    UseDefaults = false;<br>
+    Enable = true;<br>
+  } else if (ArgSub == "none") {<br>
+    UseDefaults = false;<br>
+    Enable = false;<br>
+  } else if (ArgSub == "default") {<br>
+    UseDefaults = true;<br>
+  } else {<br>
+    // Any other string is invalid or an individual setting.<br>
+    return false;<br>
+  }<br>
+<br>
+  // All enable values will be initialized to target defaults if 'default' was<br>
+  // specified.<br>
+  if (!UseDefaults)<br>
+    for (auto &KV : RecipMap)<br>
+      KV.second.Enabled = Enable;<br>
+<br>
+  // Custom refinement count was specified with all, none, or default.<br>
+  if (!RefStepString.empty())<br>
+    for (auto &KV : RecipMap)<br>
+      KV.second.RefinementSteps = RefSteps;<br>
+<br>
+  return true;<br>
+}<br>
+<br>
+void TargetRecip::parseIndividualParams(const std::vector<std::string> &Args) {<br>
+  static const char DisabledPrefix = '!';<br>
+  unsigned NumArgs = Args.size();<br>
+<br>
+  for (unsigned i = 0; i != NumArgs; ++i) {<br>
+    StringRef Val = Args[i];<br>
+<br>
+    bool IsDisabled = Val[0] == DisabledPrefix;<br>
+    // Ignore the disablement token for string matching.<br>
+    if (IsDisabled)<br>
+      Val = Val.substr(1);<br>
+<br>
+    size_t RefPos;<br>
+    uint8_t RefSteps;<br>
+    StringRef RefStepString;<br>
+    if (parseRefinementStep(Val, RefPos, RefSteps)) {<br>
+      // Split the string for further processing.<br>
+      RefStepString = Val.substr(RefPos + 1);<br>
+      Val = Val.substr(0, RefPos);<br>
+    }<br>
+<br>
+    RecipIter Iter = RecipMap.find(Val);<br>
+    if (Iter == RecipMap.end()) {<br>
+      // Try again specifying float suffix.<br>
+      Iter = RecipMap.find(Val.str() + 'f');<br>
+      if (Iter == RecipMap.end()) {<br>
+        Iter = RecipMap.find(Val.str() + 'd');<br>
+        assert(Iter == RecipMap.end() && "Float entry missing from map");<br>
+        report_fatal_error("Invalid option for -recip.");<br>
+      }<br>
+<br>
+      // The option was specified without a float or double suffix.<br>
+      if (RecipMap[Val.str() + 'd'].Enabled != Uninitialized) {<br>
+        // Make sure that the double entry was not already specified.<br>
+        // The float entry will be checked below.<br>
+        report_fatal_error("Duplicate option for -recip.");<br>
+      }<br>
+    }<br>
+<br>
+    if (Iter->second.Enabled != Uninitialized)<br>
+      report_fatal_error("Duplicate option for -recip.");<br>
+<br>
+    // Mark the matched option as found. Do not allow duplicate specifiers.<br>
+    Iter->second.Enabled = !IsDisabled;<br>
+    if (!RefStepString.empty())<br>
+      Iter->second.RefinementSteps = RefSteps;<br>
+<br>
+    // If the precision was not specified, the double entry is also initialized.<br>
+    if (Val.back() != 'f' && Val.back() != 'd') {<br>
+      RecipMap[Val.str() + 'd'].Enabled = !IsDisabled;<br>
+      if (!RefStepString.empty())<br>
+        RecipMap[Val.str() + 'd'].RefinementSteps = RefSteps;<br>
+    }<br>
+  }<br>
+}<br>
+<br>
+TargetRecip::TargetRecip(const std::vector<std::string> &Args) :<br>
+  TargetRecip() {<br>
+  unsigned NumArgs = Args.size();<br>
+<br>
+  // Check if "all", "default", or "none" was specified.<br>
+  if (NumArgs == 1 && parseGlobalParams(Args[0]))<br>
+    return;<br>
+<br>
+  parseIndividualParams(Args);<br>
+}<br>
+<br>
+bool TargetRecip::isEnabled(const StringRef &Key) const {<br>
+  ConstRecipIter Iter = RecipMap.find(Key);<br>
+  assert(Iter != RecipMap.end() && "Unknown name for reciprocal map");<br>
+  assert(Iter->second.Enabled != Uninitialized &&<br>
+         "Enablement setting was not initialized");<br>
+  return Iter->second.Enabled;<br>
+}<br>
+<br>
+unsigned TargetRecip::getRefinementSteps(const StringRef &Key) const {<br>
+  ConstRecipIter Iter = RecipMap.find(Key);<br>
+  assert(Iter != RecipMap.end() && "Unknown name for reciprocal map");<br>
+  assert(Iter->second.RefinementSteps != Uninitialized &&<br>
+         "Refinement step setting was not initialized");<br>
+  return Iter->second.RefinementSteps;<br>
+}<br>
+<br>
+/// Custom settings (previously initialized values) override target defaults.<br>
+void TargetRecip::setDefaults(const StringRef &Key, bool Enable,<br>
+                              unsigned RefSteps) {<br>
+  if (Key == "all") {<br>
+    for (auto &KV : RecipMap) {<br>
+      RecipParams &RP = KV.second;<br>
+      if (RP.Enabled == Uninitialized)<br>
+        RP.Enabled = Enable;<br>
+      if (RP.RefinementSteps == Uninitialized)<br>
+        RP.RefinementSteps = RefSteps;<br>
+    }<br>
+  } else {<br>
+    RecipParams &RP = RecipMap[Key];<br>
+    if (RP.Enabled == Uninitialized)<br>
+      RP.Enabled = Enable;<br>
+    if (RP.RefinementSteps == Uninitialized)<br>
+      RP.RefinementSteps = RefSteps;<br>
+  }<br>
+}<br>
+<br>
+bool TargetRecip::operator==(const TargetRecip &Other) const {<br>
+  for (const auto &KV : RecipMap) {<br>
+    const StringRef &Op = KV.first;<br>
+    const RecipParams &RP = KV.second;<br>
+    const RecipParams &OtherRP = Other.RecipMap.find(Op)->second;<br>
+    if (RP.RefinementSteps != OtherRP.RefinementSteps)<br>
+      return false;<br>
+    if (RP.Enabled != OtherRP.Enabled)<br>
+      return false;<br>
+  }<br>
+  return true;<br>
+}<br>
<br>
Modified: llvm/trunk/lib/Target/X86/X86.td<br>
URL: <a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_viewvc_llvm-2Dproject_llvm_trunk_lib_Target_X86_X86.td-3Frev-3D238842-26r1-3D238841-26r2-3D238842-26view-3Ddiff&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=mQ4LZ2PUj9hpadE3cDHZnIdEwhEBrbAstXeMaFoB9tg&m=bOzyN4yuudXuzutRueSI0iqV-7Clqt8sug00vLcP3zc&s=xT9vDwTRaByi0clpP3qAWIlTdjfrqIh7jmCzgKZlEDA&e=" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86.td?rev=238842&r1=238841&r2=238842&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/lib/Target/X86/X86.td (original)<br>
+++ llvm/trunk/lib/Target/X86/X86.td Tue Jun  2 10:28:15 2015<br>
@@ -188,10 +188,6 @@ def FeatureSlowLEA : SubtargetFeature<"s<br>
                                    "LEA instruction with certain arguments is slow">;<br>
 def FeatureSlowIncDec : SubtargetFeature<"slow-incdec", "SlowIncDec", "true",<br>
                                    "INC and DEC instructions are slower than ADD and SUB">;<br>
-def FeatureUseSqrtEst : SubtargetFeature<"use-sqrt-est", "UseSqrtEst", "true",<br>
-                            "Use RSQRT* to optimize square root calculations">;<br>
-def FeatureUseRecipEst : SubtargetFeature<"use-recip-est", "UseReciprocalEst",<br>
-                          "true", "Use RCP* to optimize division calculations">;<br>
 def FeatureSoftFloat<br>
     : SubtargetFeature<"soft-float", "UseSoftFloat", "true",<br>
                        "Use software floating point features.">;<br>
@@ -444,7 +440,7 @@ def : ProcessorModel<"btver2", BtVer2Mod<br>
                       FeaturePRFCHW, FeatureAES, FeaturePCLMUL,<br>
                       FeatureBMI, FeatureF16C, FeatureMOVBE,<br>
                       FeatureLZCNT, FeaturePOPCNT, FeatureFastUAMem,<br>
-                      FeatureSlowSHLD, FeatureUseSqrtEst, FeatureUseRecipEst]>;<br>
+                      FeatureSlowSHLD]>;<br>
<br>
 // TODO: We should probably add 'FeatureFastUAMem' to all of the AMD chips.<br>
<br>
<br>
Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp<br>
URL: <a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_viewvc_llvm-2Dproject_llvm_trunk_lib_Target_X86_X86ISelLowering.cpp-3Frev-3D238842-26r1-3D238841-26r2-3D238842-26view-3Ddiff&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=mQ4LZ2PUj9hpadE3cDHZnIdEwhEBrbAstXeMaFoB9tg&m=bOzyN4yuudXuzutRueSI0iqV-7Clqt8sug00vLcP3zc&s=ZWdUjpYdu6Ctp_MWsX9ADuoLRZ84TTjD1sKpYQk4ZkI&e=" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=238842&r1=238841&r2=238842&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)<br>
+++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Tue Jun  2 10:28:15 2015<br>
@@ -67,12 +67,6 @@ static cl::opt<bool> ExperimentalVectorW<br>
              "rather than promotion."),<br>
     cl::Hidden);<br>
<br>
-static cl::opt<int> ReciprocalEstimateRefinementSteps(<br>
-    "x86-recip-refinement-steps", cl::init(1),<br>
-    cl::desc("Specify the number of Newton-Raphson iterations applied to the "<br>
-             "result of the hardware reciprocal estimate instruction."),<br>
-    cl::NotHidden);<br>
-<br>
 // Forward declarations.<br>
 static SDValue getMOVL(SelectionDAG &DAG, SDLoc dl, EVT VT, SDValue V1,<br>
                        SDValue V2);<br>
@@ -13006,29 +13000,31 @@ SDValue X86TargetLowering::getRsqrtEstim<br>
                                             DAGCombinerInfo &DCI,<br>
                                             unsigned &RefinementSteps,<br>
                                             bool &UseOneConstNR) const {<br>
-  // FIXME: We should use instruction latency models to calculate the cost of<br>
-  // each potential sequence, but this is very hard to do reliably because<br>
-  // at least Intel's Core* chips have variable timing based on the number of<br>
-  // significant digits in the divisor and/or sqrt operand.<br>
-  if (!Subtarget->useSqrtEst())<br>
-    return SDValue();<br>
-<br>
   EVT VT = Op.getValueType();<br>
+  const char *RecipOp;<br>
<br>
-  // SSE1 has rsqrtss and rsqrtps.<br>
+  // SSE1 has rsqrtss and rsqrtps. AVX adds a 256-bit variant for rsqrtps.<br>
   // TODO: Add support for AVX512 (v16f32).<br>
   // It is likely not profitable to do this for f64 because a double-precision<br>
   // rsqrt estimate with refinement on x86 prior to FMA requires at least 16<br>
   // instructions: convert to single, rsqrtss, convert back to double, refine<br>
   // (3 steps = at least 13 insts). If an 'rsqrtsd' variant was added to the ISA<br>
   // along with FMA, this could be a throughput win.<br>
-  if ((Subtarget->hasSSE1() && (VT == MVT::f32 || VT == MVT::v4f32)) ||<br>
-      (Subtarget->hasAVX() && VT == MVT::v8f32)) {<br>
-    RefinementSteps = 1;<br>
-    UseOneConstNR = false;<br>
-    return DCI.DAG.getNode(X86ISD::FRSQRT, SDLoc(Op), VT, Op);<br>
-  }<br>
-  return SDValue();<br>
+  if (VT == MVT::f32 && Subtarget->hasSSE1())<br>
+    RecipOp = "sqrtf";<br>
+  else if ((VT == MVT::v4f32 && Subtarget->hasSSE1()) ||<br>
+           (VT == MVT::v8f32 && Subtarget->hasAVX()))<br>
+    RecipOp = "vec-sqrtf";<br>
+  else<br>
+    return SDValue();<br>
+<br>
+  TargetRecip Recips = DCI.DAG.getTarget().Options.Reciprocals;<br>
+  if (!Recips.isEnabled(RecipOp))<br>
+    return SDValue();<br>
+<br>
+  RefinementSteps = Recips.getRefinementSteps(RecipOp);<br>
+  UseOneConstNR = false;<br>
+  return DCI.DAG.getNode(X86ISD::FRSQRT, SDLoc(Op), VT, Op);<br>
 }<br>
<br>
 /// The minimum architected relative accuracy is 2^-12. We need one<br>
@@ -13036,15 +13032,9 @@ SDValue X86TargetLowering::getRsqrtEstim<br>
 SDValue X86TargetLowering::getRecipEstimate(SDValue Op,<br>
                                             DAGCombinerInfo &DCI,<br>
                                             unsigned &RefinementSteps) const {<br>
-  // FIXME: We should use instruction latency models to calculate the cost of<br>
-  // each potential sequence, but this is very hard to do reliably because<br>
-  // at least Intel's Core* chips have variable timing based on the number of<br>
-  // significant digits in the divisor.<br>
-  if (!Subtarget->useReciprocalEst())<br>
-    return SDValue();<br>
-<br>
   EVT VT = Op.getValueType();<br>
-<br>
+  const char *RecipOp;<br>
+<br>
   // SSE1 has rcpss and rcpps. AVX adds a 256-bit variant for rcpps.<br>
   // TODO: Add support for AVX512 (v16f32).<br>
   // It is likely not profitable to do this for f64 because a double-precision<br>
@@ -13052,12 +13042,20 @@ SDValue X86TargetLowering::getRecipEstim<br>
   // 15 instructions: convert to single, rcpss, convert back to double, refine<br>
   // (3 steps = 12 insts). If an 'rcpsd' variant was added to the ISA<br>
   // along with FMA, this could be a throughput win.<br>
-  if ((Subtarget->hasSSE1() && (VT == MVT::f32 || VT == MVT::v4f32)) ||<br>
-      (Subtarget->hasAVX() && VT == MVT::v8f32)) {<br>
-    RefinementSteps = ReciprocalEstimateRefinementSteps;<br>
-    return DCI.DAG.getNode(X86ISD::FRCP, SDLoc(Op), VT, Op);<br>
-  }<br>
-  return SDValue();<br>
+  if (VT == MVT::f32 && Subtarget->hasSSE1())<br>
+    RecipOp = "divf";<br>
+  else if ((VT == MVT::v4f32 && Subtarget->hasSSE1()) ||<br>
+           (VT == MVT::v8f32 && Subtarget->hasAVX()))<br>
+    RecipOp = "vec-divf";<br>
+  else<br>
+    return SDValue();<br>
+<br>
+  TargetRecip Recips = DCI.DAG.getTarget().Options.Reciprocals;<br>
+  if (!Recips.isEnabled(RecipOp))<br>
+    return SDValue();<br>
+<br>
+  RefinementSteps = Recips.getRefinementSteps(RecipOp);<br>
+  return DCI.DAG.getNode(X86ISD::FRCP, SDLoc(Op), VT, Op);<br>
 }<br>
<br>
 /// If we have at least two divisions that use the same divisor, convert to<br>
<br>
Modified: llvm/trunk/lib/Target/X86/X86Subtarget.cpp<br>
URL: <a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_viewvc_llvm-2Dproject_llvm_trunk_lib_Target_X86_X86Subtarget.cpp-3Frev-3D238842-26r1-3D238841-26r2-3D238842-26view-3Ddiff&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=mQ4LZ2PUj9hpadE3cDHZnIdEwhEBrbAstXeMaFoB9tg&m=bOzyN4yuudXuzutRueSI0iqV-7Clqt8sug00vLcP3zc&s=pxi1PcQplhckR1q6XIqtl_kSuMx4P0gFEPSUa21ZwyY&e=" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86Subtarget.cpp?rev=238842&r1=238841&r2=238842&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/lib/Target/X86/X86Subtarget.cpp (original)<br>
+++ llvm/trunk/lib/Target/X86/X86Subtarget.cpp Tue Jun  2 10:28:15 2015<br>
@@ -273,8 +273,6 @@ void X86Subtarget::initializeEnvironment<br>
   LEAUsesAG = false;<br>
   SlowLEA = false;<br>
   SlowIncDec = false;<br>
-  UseSqrtEst = false;<br>
-  UseReciprocalEst = false;<br>
   stackAlignment = 4;<br>
   // FIXME: this is a known good value for Yonah. How about others?<br>
   MaxInlineSizeThreshold = 128;<br>
<br>
Modified: llvm/trunk/lib/Target/X86/X86Subtarget.h<br>
URL: <a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_viewvc_llvm-2Dproject_llvm_trunk_lib_Target_X86_X86Subtarget.h-3Frev-3D238842-26r1-3D238841-26r2-3D238842-26view-3Ddiff&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=mQ4LZ2PUj9hpadE3cDHZnIdEwhEBrbAstXeMaFoB9tg&m=bOzyN4yuudXuzutRueSI0iqV-7Clqt8sug00vLcP3zc&s=MkCV_CA-1OkUGDNixp6YBgOMk_LYOy8Lw4tvz6OBrWs&e=" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86Subtarget.h?rev=238842&r1=238841&r2=238842&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/lib/Target/X86/X86Subtarget.h (original)<br>
+++ llvm/trunk/lib/Target/X86/X86Subtarget.h Tue Jun  2 10:28:15 2015<br>
@@ -190,16 +190,6 @@ protected:<br>
   /// True if INC and DEC instructions are slow when writing to flags<br>
   bool SlowIncDec;<br>
<br>
-  /// Use the RSQRT* instructions to optimize square root calculations.<br>
-  /// For this to be profitable, the cost of FSQRT and FDIV must be<br>
-  /// substantially higher than normal FP ops like FADD and FMUL.<br>
-  bool UseSqrtEst;<br>
-<br>
-  /// Use the RCP* instructions to optimize FP division calculations.<br>
-  /// For this to be profitable, the cost of FDIV must be<br>
-  /// substantially higher than normal FP ops like FADD and FMUL.<br>
-  bool UseReciprocalEst;<br>
-<br>
   /// Processor has AVX-512 PreFetch Instructions<br>
   bool HasPFI;<br>
<br>
@@ -377,8 +367,6 @@ public:<br>
   bool LEAusesAG() const { return LEAUsesAG; }<br>
   bool slowLEA() const { return SlowLEA; }<br>
   bool slowIncDec() const { return SlowIncDec; }<br>
-  bool useSqrtEst() const { return UseSqrtEst; }<br>
-  bool useReciprocalEst() const { return UseReciprocalEst; }<br>
   bool hasCDI() const { return HasCDI; }<br>
   bool hasPFI() const { return HasPFI; }<br>
   bool hasERI() const { return HasERI; }<br>
<br>
Modified: llvm/trunk/lib/Target/X86/X86TargetMachine.cpp<br>
URL: <a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_viewvc_llvm-2Dproject_llvm_trunk_lib_Target_X86_X86TargetMachine.cpp-3Frev-3D238842-26r1-3D238841-26r2-3D238842-26view-3Ddiff&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=mQ4LZ2PUj9hpadE3cDHZnIdEwhEBrbAstXeMaFoB9tg&m=bOzyN4yuudXuzutRueSI0iqV-7Clqt8sug00vLcP3zc&s=xy9SyKo3ZvTZRDQQdyg-Iyfctoyp6HpVx3TGdtwrr70&e=" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86TargetMachine.cpp?rev=238842&r1=238841&r2=238842&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/lib/Target/X86/X86TargetMachine.cpp (original)<br>
+++ llvm/trunk/lib/Target/X86/X86TargetMachine.cpp Tue Jun  2 10:28:15 2015<br>
@@ -105,6 +105,13 @@ X86TargetMachine::X86TargetMachine(const<br>
   if (Subtarget.isTargetWin64())<br>
     this->Options.TrapUnreachable = true;<br>
<br>
+  // TODO: By default, all reciprocal estimate operations are off because<br>
+  // that matches the behavior before TargetRecip was added (except for btver2<br>
+  // which used subtarget features to enable this type of codegen).<br>
+  // We should change this to match GCC behavior where everything but<br>
+  // scalar division estimates are turned on by default with -ffast-math.<br>
+  this->Options.Reciprocals.setDefaults("all", false, 1);<br>
+<br>
   initAsmInfo();<br>
 }<br>
<br>
<br>
Modified: llvm/trunk/test/CodeGen/X86/recip-fastmath.ll<br>
URL: <a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_viewvc_llvm-2Dproject_llvm_trunk_test_CodeGen_X86_recip-2Dfastmath.ll-3Frev-3D238842-26r1-3D238841-26r2-3D238842-26view-3Ddiff&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=mQ4LZ2PUj9hpadE3cDHZnIdEwhEBrbAstXeMaFoB9tg&m=bOzyN4yuudXuzutRueSI0iqV-7Clqt8sug00vLcP3zc&s=kSmfJ80ptfoOl3NmbFPPju834lJkPHYPhVPI4LdvlH0&e=" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/recip-fastmath.ll?rev=238842&r1=238841&r2=238842&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/test/CodeGen/X86/recip-fastmath.ll (original)<br>
+++ llvm/trunk/test/CodeGen/X86/recip-fastmath.ll Tue Jun  2 10:28:15 2015<br>
@@ -1,6 +1,6 @@<br>
 ; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=sse2 | FileCheck %s<br>
-; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx,use-recip-est | FileCheck %s --check-prefix=RECIP<br>
-; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx,use-recip-est -x86-recip-refinement-steps=2 | FileCheck %s --check-prefix=REFINE<br>
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx -recip=divf,vec-divf | FileCheck %s --check-prefix=RECIP<br>
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx -recip=divf:2,vec-divf:2 | FileCheck %s --check-prefix=REFINE<br>
<br>
 ; If the target's divss/divps instructions are substantially<br>
 ; slower than rcpss/rcpps with a Newton-Raphson refinement,<br>
<br>
Modified: llvm/trunk/test/CodeGen/X86/sqrt-fastmath.ll<br>
URL: <a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_viewvc_llvm-2Dproject_llvm_trunk_test_CodeGen_X86_sqrt-2Dfastmath.ll-3Frev-3D238842-26r1-3D238841-26r2-3D238842-26view-3Ddiff&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=mQ4LZ2PUj9hpadE3cDHZnIdEwhEBrbAstXeMaFoB9tg&m=bOzyN4yuudXuzutRueSI0iqV-7Clqt8sug00vLcP3zc&s=zx7E8bEUb-iLukgG3p1WGHRu46QbD0iON3RVbJBOPYw&e=" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/sqrt-fastmath.ll?rev=238842&r1=238841&r2=238842&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/test/CodeGen/X86/sqrt-fastmath.ll (original)<br>
+++ llvm/trunk/test/CodeGen/X86/sqrt-fastmath.ll Tue Jun  2 10:28:15 2015<br>
@@ -1,5 +1,5 @@<br>
 ; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=sse2 | FileCheck %s<br>
-; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx,use-sqrt-est | FileCheck %s --check-prefix=ESTIMATE<br>
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx -recip=sqrtf,vec-sqrtf | FileCheck %s --check-prefix=ESTIMATE<br>
<br>
 declare double @__sqrt_finite(double) #0<br>
 declare float @__sqrtf_finite(float) #0<br>
<br>
<br>
_______________________________________________<br>
llvm-commits mailing list<br>
<a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>
</blockquote></div><br></div></div></div></div>