[polly] r299359 - [CodeGen] Add Performance Monitor

Tobias Grosser via llvm-commits llvm-commits at lists.llvm.org
Mon Apr 3 07:55:37 PDT 2017


Author: grosser
Date: Mon Apr  3 09:55:37 2017
New Revision: 299359

URL: http://llvm.org/viewvc/llvm-project?rev=299359&view=rev
Log:
[CodeGen] Add Performance Monitor

Add support for -polly-codegen-perf-monitoring. When performance monitoring
is enabled, we emit performance monitoring code during code generation that
prints after program exit statistics about the total number of cycles executed
as well as the number of cycles spent in scops. This gives an estimate on how
useful polyhedral optimizations might be for a given program.

Example output:

  Polly runtime information
  -------------------------
  Total: 783110081637
  Scops: 663718949365

In the future, we might also add functionality to measure how much time is spent
in optimized scops and how many cycles are spent in the fallback code.

Reviewers: bollu,sebpop

Tags: #polly

Differential Revision: https://reviews.llvm.org/D31599

Added:
    polly/trunk/include/polly/CodeGen/PerfMonitor.h
    polly/trunk/lib/CodeGen/PerfMonitor.cpp
    polly/trunk/test/Isl/CodeGen/perf_monitoring.ll
Modified:
    polly/trunk/lib/CMakeLists.txt
    polly/trunk/lib/CodeGen/CodeGeneration.cpp

Added: polly/trunk/include/polly/CodeGen/PerfMonitor.h
URL: http://llvm.org/viewvc/llvm-project/polly/trunk/include/polly/CodeGen/PerfMonitor.h?rev=299359&view=auto
==============================================================================
--- polly/trunk/include/polly/CodeGen/PerfMonitor.h (added)
+++ polly/trunk/include/polly/CodeGen/PerfMonitor.h Mon Apr  3 09:55:37 2017
@@ -0,0 +1,132 @@
+//===--- PerfMonitor.h --- Monitor time spent in scops --------------------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef PERF_MONITOR_H
+#define PERF_MONITOR_H
+
+#include "polly/CodeGen/IRBuilder.h"
+
+namespace llvm {
+class Function;
+class Module;
+class Value;
+class Instruction;
+} // namespace llvm
+
+namespace polly {
+
+class PerfMonitor {
+public:
+  /// Create a new performance monitor.
+  ///
+  /// @param M The module for which to generate the performance monitor.
+  PerfMonitor(llvm::Module *M);
+
+  /// Initialize the performance monitor.
+  ///
+  /// Ensure that all global variables, functions, and callbacks needed to
+  /// manage the performance monitor are initialized and registered.
+  void initialize();
+
+  /// Mark the beginning of a timing region.
+  ///
+  /// @param InsertBefore The instruction before which the timing region starts.
+  void insertRegionStart(llvm::Instruction *InserBefore);
+
+  /// Mark the end of a timing region.
+  ///
+  /// @param InsertBefore The instruction before which the timing region starts.
+  void insertRegionEnd(llvm::Instruction *InsertBefore);
+
+private:
+  llvm::Module *M;
+  PollyIRBuilder Builder;
+
+  /// Indicates if performance profiling is supported on this architecture.
+  bool Supported;
+
+  /// The cycle counter at the beginning of the program execution.
+  llvm::Value *CyclesTotalStartPtr;
+
+  /// The total number of cycles spent within scops.
+  llvm::Value *CyclesInScopsPtr;
+
+  /// The value of the cycle counter at the beginning of the last scop.
+  llvm::Value *CyclesInScopStartPtr;
+
+  /// A memory location which serves as argument of the RDTSCP function.
+  ///
+  /// The value written to this location is currently not used.
+  llvm::Value *RDTSCPWriteLocation;
+
+  /// A global variable, that keeps track if the performance monitor
+  /// initialization has already been run.
+  llvm::Value *AlreadyInitializedPtr;
+
+  llvm::Function *insertInitFunction(llvm::Function *FinalReporting);
+
+  /// Add Function @p to list of global constructors
+  ///
+  /// If no global constructors are available in this current module, insert
+  /// a new list of global constructors containing @p Fn as only global
+  /// constructor. Otherwise, append @p Fn to the list of global constructors.
+  ///
+  /// All functions listed as global constructors are executed before the
+  /// main() function is called.
+  ///
+  /// @param Fn Function to add to global constructors
+  void addToGlobalConstructors(llvm::Function *Fn);
+
+  /// Add global variables to module.
+  ///
+  /// Insert a set of global variables that are used to track performance,
+  /// into the module (or obtain references to them if they already exist).
+  void addGlobalVariables();
+
+  /// Get a reference to the intrinsic "i64 @llvm.x86.rdtscp(i8*)".
+  ///
+  /// The rdtscp function returns the current value of the processor's
+  /// time-stamp counter as well as the current CPU identifier. On modern x86
+  /// systems, the returned value is independent of the dynamic clock frequency
+  /// and consistent across multiple cores. It can consequently be used to get
+  /// accurate and low-overhead timing information. Even though the counter is
+  /// wrapping, it can be reliably used even for measuring longer time
+  /// intervals, as on a 1 GHz processor the counter only wraps every 545 years.
+  ///
+  /// The RDTSCP instruction is "pseudo" serializing:
+  ///
+  /// "“The RDTSCP instruction waits until all previous instructions have been
+  /// executed before reading the counter. However, subsequent instructions may
+  /// begin execution before the read operation is performed.”
+  ///
+  /// To ensure that no later instructions are scheduled before the RDTSCP
+  /// instruction it is often recommended to schedule a cpuid call after the
+  /// RDTSCP instruction. We do not do this yet, trading some imprecision in
+  /// our timing for a reduced overhead in our timing.
+  ///
+  /// @returns A reference to the declaration of @llvm.x86.rdtscp.
+  llvm::Function *getRDTSCP();
+
+  /// Get a reference to "int atexit(void (*function)(void))" function.
+  ///
+  /// This function allows to register function pointers that must be executed
+  /// when the program is terminated.
+  ///
+  /// @returns A reference to @atexit().
+  llvm::Function *getAtExit();
+
+  /// Create function "__polly_perf_final_reporting".
+  ///
+  /// This function finalizes the performance measurements and prints the
+  /// results to stdout. It is expected to be registered with 'atexit()'.
+  llvm::Function *insertFinalReporting();
+};
+} // namespace polly
+
+#endif

Modified: polly/trunk/lib/CMakeLists.txt
URL: http://llvm.org/viewvc/llvm-project/polly/trunk/lib/CMakeLists.txt?rev=299359&r1=299358&r2=299359&view=diff
==============================================================================
--- polly/trunk/lib/CMakeLists.txt (original)
+++ polly/trunk/lib/CMakeLists.txt Mon Apr  3 09:55:37 2017
@@ -43,6 +43,7 @@ add_polly_library(Polly
   CodeGen/Utils.cpp
   CodeGen/RuntimeDebugBuilder.cpp
   CodeGen/CodegenCleanup.cpp
+  CodeGen/PerfMonitor.cpp
   ${GPGPU_CODEGEN_FILES}
   Exchange/JSONExporter.cpp
   Support/GICHelper.cpp

Modified: polly/trunk/lib/CodeGen/CodeGeneration.cpp
URL: http://llvm.org/viewvc/llvm-project/polly/trunk/lib/CodeGen/CodeGeneration.cpp?rev=299359&r1=299358&r2=299359&view=diff
==============================================================================
--- polly/trunk/lib/CodeGen/CodeGeneration.cpp (original)
+++ polly/trunk/lib/CodeGen/CodeGeneration.cpp Mon Apr  3 09:55:37 2017
@@ -21,6 +21,7 @@
 
 #include "polly/CodeGen/IslAst.h"
 #include "polly/CodeGen/IslNodeBuilder.h"
+#include "polly/CodeGen/PerfMonitor.h"
 #include "polly/CodeGen/Utils.h"
 #include "polly/DependenceInfo.h"
 #include "polly/LinkAllPasses.h"
@@ -45,6 +46,11 @@ static cl::opt<bool> Verify("polly-codeg
                             cl::Hidden, cl::init(true), cl::ZeroOrMore,
                             cl::cat(PollyCategory));
 
+static cl::opt<bool>
+    PerfMonitoring("polly-codegen-perf-monitoring",
+                   cl::desc("Add run-time performance monitoring"), cl::Hidden,
+                   cl::init(false), cl::ZeroOrMore, cl::cat(PollyCategory));
+
 namespace {
 class CodeGeneration : public ScopPass {
 public:
@@ -145,6 +151,18 @@ public:
     IslNodeBuilder NodeBuilder(Builder, Annotator, this, *DL, *LI, *SE, *DT, S,
                                StartBlock);
 
+    if (PerfMonitoring) {
+      PerfMonitor P(EnteringBB->getParent()->getParent());
+      P.initialize();
+      P.insertRegionStart(SplitBlock->getTerminator());
+
+      BasicBlock *MergeBlock = SplitBlock->getTerminator()
+                                   ->getSuccessor(0)
+                                   ->getUniqueSuccessor()
+                                   ->getUniqueSuccessor();
+      P.insertRegionEnd(MergeBlock->getTerminator());
+    }
+
     // First generate code for the hoisted invariant loads and transitively the
     // parameters they reference. Afterwards, for the remaining parameters that
     // might reference the hoisted loads. Finally, build the runtime check

Added: polly/trunk/lib/CodeGen/PerfMonitor.cpp
URL: http://llvm.org/viewvc/llvm-project/polly/trunk/lib/CodeGen/PerfMonitor.cpp?rev=299359&view=auto
==============================================================================
--- polly/trunk/lib/CodeGen/PerfMonitor.cpp (added)
+++ polly/trunk/lib/CodeGen/PerfMonitor.cpp Mon Apr  3 09:55:37 2017
@@ -0,0 +1,235 @@
+//===------ PerfMonitor.cpp - Generate a run-time performance monitor. -======//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+//===----------------------------------------------------------------------===//
+
+#include "polly/CodeGen/PerfMonitor.h"
+#include "polly/CodeGen/RuntimeDebugBuilder.h"
+#include "llvm/ADT/Triple.h"
+
+using namespace llvm;
+using namespace polly;
+
+Function *PerfMonitor::getAtExit() {
+  const char *Name = "atexit";
+  Function *F = M->getFunction(Name);
+
+  if (!F) {
+    GlobalValue::LinkageTypes Linkage = Function::ExternalLinkage;
+    FunctionType *Ty = FunctionType::get(Builder.getInt32Ty(),
+                                         {Builder.getInt8PtrTy()}, false);
+    F = Function::Create(Ty, Linkage, Name, M);
+  }
+
+  return F;
+}
+
+void PerfMonitor::addToGlobalConstructors(Function *Fn) {
+  const char *Name = "llvm.global_ctors";
+  GlobalVariable *GV = M->getGlobalVariable(Name);
+  std::vector<Constant *> V;
+
+  if (GV) {
+    Constant *Array = GV->getInitializer();
+    for (Value *X : Array->operand_values())
+      V.push_back(cast<Constant>(X));
+    GV->eraseFromParent();
+  }
+
+  StructType *ST = StructType::get(Builder.getInt32Ty(), Fn->getType(),
+                                   Builder.getInt8PtrTy(), nullptr);
+
+  V.push_back(ConstantStruct::get(
+      ST, Builder.getInt32(10), Fn,
+      ConstantPointerNull::get(Builder.getInt8PtrTy()), nullptr));
+  ArrayType *Ty = ArrayType::get(ST, V.size());
+
+  GV = new GlobalVariable(*M, Ty, true, GlobalValue::AppendingLinkage,
+                          ConstantArray::get(Ty, V), Name, nullptr,
+                          GlobalVariable::NotThreadLocal);
+}
+
+Function *PerfMonitor::getRDTSCP() {
+  const char *Name = "llvm.x86.rdtscp";
+  Function *F = M->getFunction(Name);
+
+  if (!F) {
+    GlobalValue::LinkageTypes Linkage = Function::ExternalLinkage;
+    FunctionType *Ty = FunctionType::get(Builder.getInt64Ty(),
+                                         {Builder.getInt8PtrTy()}, false);
+    F = Function::Create(Ty, Linkage, Name, M);
+  }
+
+  return F;
+}
+
+PerfMonitor::PerfMonitor(Module *M) : M(M), Builder(M->getContext()) {
+  if (Triple(M->getTargetTriple()).getArch() == llvm::Triple::x86_64)
+    Supported = true;
+  else
+    Supported = false;
+}
+
+void PerfMonitor::addGlobalVariables() {
+  auto TryRegisterGlobal = [=](const char *Name, Constant *InitialValue,
+                               Value **Location) {
+    *Location = M->getGlobalVariable(Name);
+
+    if (!*Location)
+      *Location = new GlobalVariable(
+          *M, InitialValue->getType(), true, GlobalValue::WeakAnyLinkage,
+          InitialValue, Name, nullptr, GlobalVariable::InitialExecTLSModel);
+  };
+
+  TryRegisterGlobal("__polly_perf_cycles_total_start", Builder.getInt64(0),
+                    &CyclesTotalStartPtr);
+
+  TryRegisterGlobal("__polly_perf_initialized", Builder.getInt1(0),
+                    &AlreadyInitializedPtr);
+
+  TryRegisterGlobal("__polly_perf_cycles_in_scops", Builder.getInt64(0),
+                    &CyclesInScopsPtr);
+
+  TryRegisterGlobal("__polly_perf_cycles_in_scop_start", Builder.getInt64(0),
+                    &CyclesInScopStartPtr);
+
+  TryRegisterGlobal("__polly_perf_write_loation", Builder.getInt32(0),
+                    &RDTSCPWriteLocation);
+}
+
+static const char *InitFunctionName = "__polly_perf_init";
+static const char *FinalReportingFunctionName = "__polly_perf_final";
+
+Function *PerfMonitor::insertFinalReporting() {
+  // Create new function.
+  GlobalValue::LinkageTypes Linkage = Function::WeakODRLinkage;
+  FunctionType *Ty = FunctionType::get(Builder.getVoidTy(), {}, false);
+  Function *ExitFn =
+      Function::Create(Ty, Linkage, FinalReportingFunctionName, M);
+  BasicBlock *Start = BasicBlock::Create(M->getContext(), "start", ExitFn);
+  Builder.SetInsertPoint(Start);
+
+  if (!Supported) {
+    RuntimeDebugBuilder::createCPUPrinter(
+        Builder, "Polly runtime information generation not supported\n");
+    Builder.CreateRetVoid();
+    return ExitFn;
+  }
+
+  // Measure current cycles and compute final timings.
+  Function *RDTSCPFn = getRDTSCP();
+  Value *CurrentCycles = Builder.CreateCall(
+      RDTSCPFn,
+      Builder.CreatePointerCast(RDTSCPWriteLocation, Builder.getInt8PtrTy()));
+  Value *CyclesStart = Builder.CreateLoad(CyclesTotalStartPtr, true);
+  Value *CyclesTotal = Builder.CreateSub(CurrentCycles, CyclesStart);
+  Value *CyclesInScops = Builder.CreateLoad(CyclesInScopsPtr, true);
+
+  // Print the runtime information.
+  RuntimeDebugBuilder::createCPUPrinter(Builder, "Polly runtime information\n");
+  RuntimeDebugBuilder::createCPUPrinter(Builder, "-------------------------\n");
+  RuntimeDebugBuilder::createCPUPrinter(Builder, "Total: ", CyclesTotal, "\n");
+  RuntimeDebugBuilder::createCPUPrinter(Builder, "Scops: ", CyclesInScops,
+                                        "\n");
+
+  // Finalize function.
+  Builder.CreateRetVoid();
+  return ExitFn;
+}
+
+void PerfMonitor::initialize() {
+  addGlobalVariables();
+
+  Function *F = M->getFunction(InitFunctionName);
+  if (F)
+    return;
+
+  // initialize
+  Function *FinalReporting = insertFinalReporting();
+  Function *InitFn = insertInitFunction(FinalReporting);
+  addToGlobalConstructors(InitFn);
+}
+
+Function *PerfMonitor::insertInitFunction(Function *FinalReporting) {
+  // Insert function definition and BBs.
+  GlobalValue::LinkageTypes Linkage = Function::WeakODRLinkage;
+  FunctionType *Ty = FunctionType::get(Builder.getVoidTy(), {}, false);
+  Function *InitFn = Function::Create(Ty, Linkage, InitFunctionName, M);
+  BasicBlock *Start = BasicBlock::Create(M->getContext(), "start", InitFn);
+  BasicBlock *EarlyReturn =
+      BasicBlock::Create(M->getContext(), "earlyreturn", InitFn);
+  BasicBlock *InitBB = BasicBlock::Create(M->getContext(), "initbb", InitFn);
+
+  Builder.SetInsertPoint(Start);
+
+  // Check if this function was already run. If yes, return.
+  //
+  // In case profiling has been enabled in multiple translation units, the
+  // initializer function will be added to the global constructors list of
+  // each translation unit. When merging translation units, the global
+  // constructor lists are just appended, such that the initializer will appear
+  // multiple times. To avoid initializations being run multiple times (and
+  // especially to avoid that atExitFn is called more than once), we bail
+  // out if the intializer is run more than once.
+  Value *HasRunBefore = Builder.CreateLoad(AlreadyInitializedPtr);
+  Builder.CreateCondBr(HasRunBefore, EarlyReturn, InitBB);
+  Builder.SetInsertPoint(EarlyReturn);
+  Builder.CreateRetVoid();
+
+  // Keep track that this function has been run once.
+  Builder.SetInsertPoint(InitBB);
+  Value *True = Builder.getInt1(true);
+  Builder.CreateStore(True, AlreadyInitializedPtr);
+
+  // Register the final reporting function with atexit().
+  Value *FinalReportingPtr =
+      Builder.CreatePointerCast(FinalReporting, Builder.getInt8PtrTy());
+  Function *AtExitFn = getAtExit();
+  Builder.CreateCall(AtExitFn, {FinalReportingPtr});
+
+  if (Supported) {
+    // Read the currently cycle counter and store the result for later.
+    Function *RDTSCPFn = getRDTSCP();
+    Value *CurrentCycles = Builder.CreateCall(
+        RDTSCPFn,
+        Builder.CreatePointerCast(RDTSCPWriteLocation, Builder.getInt8PtrTy()));
+    Builder.CreateStore(CurrentCycles, CyclesTotalStartPtr, true);
+  }
+  Builder.CreateRetVoid();
+
+  return InitFn;
+}
+
+void PerfMonitor::insertRegionStart(Instruction *InsertBefore) {
+  if (!Supported)
+    return;
+
+  Builder.SetInsertPoint(InsertBefore);
+  Function *RDTSCPFn = getRDTSCP();
+  Value *CurrentCycles = Builder.CreateCall(
+      RDTSCPFn,
+      Builder.CreatePointerCast(RDTSCPWriteLocation, Builder.getInt8PtrTy()));
+  Builder.CreateStore(CurrentCycles, CyclesInScopStartPtr, true);
+}
+
+void PerfMonitor::insertRegionEnd(Instruction *InsertBefore) {
+  if (!Supported)
+    return;
+
+  Builder.SetInsertPoint(InsertBefore);
+  Function *RDTSCPFn = getRDTSCP();
+  LoadInst *CyclesStart = Builder.CreateLoad(CyclesInScopStartPtr, true);
+  Value *CurrentCycles = Builder.CreateCall(
+      RDTSCPFn,
+      Builder.CreatePointerCast(RDTSCPWriteLocation, Builder.getInt8PtrTy()));
+  Value *CyclesInScop = Builder.CreateSub(CurrentCycles, CyclesStart);
+  Value *CyclesInScops = Builder.CreateLoad(CyclesInScopsPtr, true);
+  CyclesInScops = Builder.CreateAdd(CyclesInScops, CyclesInScop);
+  Builder.CreateStore(CyclesInScops, CyclesInScopsPtr, true);
+}

Added: polly/trunk/test/Isl/CodeGen/perf_monitoring.ll
URL: http://llvm.org/viewvc/llvm-project/polly/trunk/test/Isl/CodeGen/perf_monitoring.ll?rev=299359&view=auto
==============================================================================
--- polly/trunk/test/Isl/CodeGen/perf_monitoring.ll (added)
+++ polly/trunk/test/Isl/CodeGen/perf_monitoring.ll Mon Apr  3 09:55:37 2017
@@ -0,0 +1,87 @@
+; RUN: opt %loadPolly -polly-codegen -polly-codegen-perf-monitoring \
+; RUN:   -S < %s | FileCheck %s
+
+; void f(long A[], long N) {
+;   long i;
+;   if (true)
+;     for (i = 0; i < N; ++i)
+;       A[i] = i;
+; }
+
+target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128"
+target triple = "x86_64-unknown-linux-gnu"
+
+define void @f(i64* %A, i64 %N) nounwind {
+entry:
+  fence seq_cst
+  br label %next
+
+next:
+  br i1 true, label %for.i, label %return
+
+for.i:
+  %indvar = phi i64 [ 0, %next], [ %indvar.next, %for.i ]
+  %scevgep = getelementptr i64, i64* %A, i64 %indvar
+  store i64 %indvar, i64* %scevgep
+  %indvar.next = add nsw i64 %indvar, 1
+  %exitcond = icmp eq i64 %indvar.next, %N
+  br i1 %exitcond, label %return, label %for.i
+
+return:
+  fence seq_cst
+  ret void
+}
+
+; CHECK:      @__polly_perf_cycles_total_start = weak thread_local(initialexec) constant i64 0
+; CHECK-NEXT: @__polly_perf_initialized = weak thread_local(initialexec) constant i1 false
+; CHECK-NEXT: @__polly_perf_cycles_in_scops = weak thread_local(initialexec) constant i64 0
+; CHECK-NEXT: @__polly_perf_cycles_in_scop_start = weak thread_local(initialexec) constant i64 0
+; CHECK-NEXT: @__polly_perf_write_loation = weak thread_local(initialexec) constant i32 0
+
+; CHECK:      polly.split_new_and_old:                          ; preds = %entry
+; CHECK-NEXT:   %0 = call i64 @llvm.x86.rdtscp(i8* bitcast (i32* @__polly_perf_write_loation to i8*))
+; CHECK-NEXT:   store volatile i64 %0, i64* @__polly_perf_cycles_in_scop_start
+
+; CHECK:      polly.merge_new_and_old:                          ; preds = %polly.exiting, %return.region_exiting
+; CHECK-NEXT:   %5 = load volatile i64, i64* @__polly_perf_cycles_in_scop_start
+; CHECK-NEXT:   %6 = call i64 @llvm.x86.rdtscp(i8* bitcast (i32* @__polly_perf_write_loation to i8*))
+; CHECK-NEXT:   %7 = sub i64 %6, %5
+; CHECK-NEXT:   %8 = load volatile i64, i64* @__polly_perf_cycles_in_scops
+; CHECK-NEXT:   %9 = add i64 %8, %7
+; CHECK-NEXT:   store volatile i64 %9, i64* @__polly_perf_cycles_in_scops
+; CHECK-NEXT:   br label %return
+
+
+; CHECK:      define weak_odr void @__polly_perf_final() {
+; CHECK-NEXT: start:
+; CHECK-NEXT:   %0 = call i64 @llvm.x86.rdtscp(i8* bitcast (i32* @__polly_perf_write_loation to i8*))
+; CHECK-NEXT:   %1 = load volatile i64, i64* @__polly_perf_cycles_total_start
+; CHECK-NEXT:   %2 = sub i64 %0, %1
+; CHECK-NEXT:   %3 = load volatile i64, i64* @__polly_perf_cycles_in_scops
+; CHECK-NEXT:   %4 = call i32 (...) @printf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @1, i32 0, i32 0), i8 addrspace(4)* getelementptr inbounds ([27 x i8], [27 x i8] addrspace(4)* @0, i32 0, i32 0))
+; CHECK-NEXT:   %5 = call i32 @fflush(i8* null)
+; CHECK-NEXT:   %6 = call i32 (...) @printf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @3, i32 0, i32 0), i8 addrspace(4)* getelementptr inbounds ([27 x i8], [27 x i8] addrspace(4)* @2, i32 0, i32 0))
+; CHECK-NEXT:   %7 = call i32 @fflush(i8* null)
+; CHECK-NEXT:   %8 = call i32 (...) @printf(i8* getelementptr inbounds ([8 x i8], [8 x i8]* @6, i32 0, i32 0), i8 addrspace(4)* getelementptr inbounds ([8 x i8], [8 x i8] addrspace(4)* @4, i32 0, i32 0), i64 %2, i8 addrspace(4)* getelementptr inbounds ([2 x i8], [2 x i8] addrspace(4)* @5, i32 0, i32 0))
+; CHECK-NEXT:   %9 = call i32 @fflush(i8* null)
+; CHECK-NEXT:   %10 = call i32 (...) @printf(i8* getelementptr inbounds ([8 x i8], [8 x i8]* @9, i32 0, i32 0), i8 addrspace(4)* getelementptr inbounds ([8 x i8], [8 x i8] addrspace(4)* @7, i32 0, i32 0), i64 %3, i8 addrspace(4)* getelementptr inbounds ([2 x i8], [2 x i8] addrspace(4)* @8, i32 0, i32 0))
+; CHECK-NEXT:   %11 = call i32 @fflush(i8* null)
+; CHECK-NEXT:   ret void
+; CHECK-NEXT: }
+
+
+; CHECK:      define weak_odr void @__polly_perf_init() {
+; CHECK-NEXT: start:
+; CHECK-NEXT:   %0 = load i1, i1* @__polly_perf_initialized
+; CHECK-NEXT:   br i1 %0, label %earlyreturn, label %initbb
+
+; CHECK:      earlyreturn:                                      ; preds = %start
+; CHECK-NEXT:   ret void
+
+; CHECK:      initbb:                                           ; preds = %start
+; CHECK-NEXT:   store i1 true, i1* @__polly_perf_initialized
+; CHECK-NEXT:   %1 = call i32 @atexit(i8* bitcast (void ()* @__polly_perf_final to i8*))
+; CHECK-NEXT:   %2 = call i64 @llvm.x86.rdtscp(i8* bitcast (i32* @__polly_perf_write_loation to i8*))
+; CHECK-NEXT:   store volatile i64 %2, i64* @__polly_perf_cycles_total_start
+; CHECK-NEXT:   ret void
+; CHECK-NEXT: }




More information about the llvm-commits mailing list