[polly] r299359 - [CodeGen] Add Performance Monitor

Hongbin Zheng via llvm-commits llvm-commits at lists.llvm.org
Mon Apr 3 08:27:00 PDT 2017


On Mon, Apr 3, 2017 at 7:55 AM, Tobias Grosser via llvm-commits <
llvm-commits at lists.llvm.org> wrote:

> Author: grosser
> Date: Mon Apr  3 09:55:37 2017
> New Revision: 299359
>
> URL: http://llvm.org/viewvc/llvm-project?rev=299359&view=rev
> Log:
> [CodeGen] Add Performance Monitor
>
> Add support for -polly-codegen-perf-monitoring. When performance
> monitoring
> is enabled, we emit performance monitoring code during code generation that
> prints after program exit statistics about the total number of cycles
> executed
> as well as the number of cycles spent in scops. This gives an estimate on
> how
> useful polyhedral optimizations might be for a given program.
>
> Example output:
>
>   Polly runtime information
>   -------------------------
>   Total: 783110081637
>   Scops: 663718949365
>
> In the future, we might also add functionality to measure how much time is
> spent
> in optimized scops and how many cycles are spent in the fallback code.
>
> Reviewers: bollu,sebpop
>
> Tags: #polly
>
> Differential Revision: https://reviews.llvm.org/D31599
>
> Added:
>     polly/trunk/include/polly/CodeGen/PerfMonitor.h
>     polly/trunk/lib/CodeGen/PerfMonitor.cpp
>     polly/trunk/test/Isl/CodeGen/perf_monitoring.ll
> Modified:
>     polly/trunk/lib/CMakeLists.txt
>     polly/trunk/lib/CodeGen/CodeGeneration.cpp
>
> Added: polly/trunk/include/polly/CodeGen/PerfMonitor.h
> URL: http://llvm.org/viewvc/llvm-project/polly/trunk/include/
> polly/CodeGen/PerfMonitor.h?rev=299359&view=auto
> ============================================================
> ==================
> --- polly/trunk/include/polly/CodeGen/PerfMonitor.h (added)
> +++ polly/trunk/include/polly/CodeGen/PerfMonitor.h Mon Apr  3 09:55:37
> 2017
> @@ -0,0 +1,132 @@
> +//===--- PerfMonitor.h --- Monitor time spent in scops
> --------------------===//
> +//
> +//                     The LLVM Compiler Infrastructure
> +//
> +// This file is distributed under the University of Illinois Open Source
> +// License. See LICENSE.TXT for details.
> +//
> +//===------------------------------------------------------
> ----------------===//
> +
> +#ifndef PERF_MONITOR_H
> +#define PERF_MONITOR_H
> +
> +#include "polly/CodeGen/IRBuilder.h"
> +
> +namespace llvm {
> +class Function;
> +class Module;
> +class Value;
> +class Instruction;
> +} // namespace llvm
> +
> +namespace polly {
> +
> +class PerfMonitor {
> +public:
> +  /// Create a new performance monitor.
> +  ///
> +  /// @param M The module for which to generate the performance monitor.
> +  PerfMonitor(llvm::Module *M);
> +
> +  /// Initialize the performance monitor.
> +  ///
> +  /// Ensure that all global variables, functions, and callbacks needed to
> +  /// manage the performance monitor are initialized and registered.
> +  void initialize();
> +
> +  /// Mark the beginning of a timing region.
> +  ///
> +  /// @param InsertBefore The instruction before which the timing region
> starts.
> +  void insertRegionStart(llvm::Instruction *InserBefore);
> +
> +  /// Mark the end of a timing region.
> +  ///
> +  /// @param InsertBefore The instruction before which the timing region
> starts.
> +  void insertRegionEnd(llvm::Instruction *InsertBefore);
> +
> +private:
> +  llvm::Module *M;
> +  PollyIRBuilder Builder;
> +
> +  /// Indicates if performance profiling is supported on this
> architecture.
> +  bool Supported;
> +
> +  /// The cycle counter at the beginning of the program execution.
> +  llvm::Value *CyclesTotalStartPtr;
> +
> +  /// The total number of cycles spent within scops.
> +  llvm::Value *CyclesInScopsPtr;
> +
> +  /// The value of the cycle counter at the beginning of the last scop.
> +  llvm::Value *CyclesInScopStartPtr;
> +
> +  /// A memory location which serves as argument of the RDTSCP function.
> +  ///
> +  /// The value written to this location is currently not used.
> +  llvm::Value *RDTSCPWriteLocation;
> +
> +  /// A global variable, that keeps track if the performance monitor
> +  /// initialization has already been run.
> +  llvm::Value *AlreadyInitializedPtr;
> +
> +  llvm::Function *insertInitFunction(llvm::Function *FinalReporting);
> +
> +  /// Add Function @p to list of global constructors
> +  ///
> +  /// If no global constructors are available in this current module,
> insert
> +  /// a new list of global constructors containing @p Fn as only global
> +  /// constructor. Otherwise, append @p Fn to the list of global
> constructors.
> +  ///
> +  /// All functions listed as global constructors are executed before the
> +  /// main() function is called.
> +  ///
> +  /// @param Fn Function to add to global constructors
> +  void addToGlobalConstructors(llvm::Function *Fn);
> +
> +  /// Add global variables to module.
> +  ///
> +  /// Insert a set of global variables that are used to track performance,
> +  /// into the module (or obtain references to them if they already
> exist).
> +  void addGlobalVariables();
> +
> +  /// Get a reference to the intrinsic "i64 @llvm.x86.rdtscp(i8*)".
> +  ///
> +  /// The rdtscp function returns the current value of the processor's
> +  /// time-stamp counter as well as the current CPU identifier. On modern
> x86
> +  /// systems, the returned value is independent of the dynamic clock
> frequency
> +  /// and consistent across multiple cores. It can consequently be used
> to get
> +  /// accurate and low-overhead timing information. Even though the
> counter is
> +  /// wrapping, it can be reliably used even for measuring longer time
> +  /// intervals, as on a 1 GHz processor the counter only wraps every 545
> years.
> +  ///
> +  /// The RDTSCP instruction is "pseudo" serializing:
> +  ///
> +  /// "“The RDTSCP instruction waits until all previous instructions
> have been
> +  /// executed before reading the counter. However, subsequent
> instructions may
> +  /// begin execution before the read operation is performed.â€
> +  ///
> +  /// To ensure that no later instructions are scheduled before the RDTSCP
> +  /// instruction it is often recommended to schedule a cpuid call after
> the
> +  /// RDTSCP instruction. We do not do this yet, trading some imprecision
> in
> +  /// our timing for a reduced overhead in our timing.
> +  ///
> +  /// @returns A reference to the declaration of @llvm.x86.rdtscp.
> +  llvm::Function *getRDTSCP();
> +
> +  /// Get a reference to "int atexit(void (*function)(void))" function.
> +  ///
> +  /// This function allows to register function pointers that must be
> executed
> +  /// when the program is terminated.
> +  ///
> +  /// @returns A reference to @atexit().
> +  llvm::Function *getAtExit();
> +
> +  /// Create function "__polly_perf_final_reporting".
> +  ///
> +  /// This function finalizes the performance measurements and prints the
> +  /// results to stdout. It is expected to be registered with 'atexit()'.
> +  llvm::Function *insertFinalReporting();
> +};
> +} // namespace polly
> +
> +#endif
>
> Modified: polly/trunk/lib/CMakeLists.txt
> URL: http://llvm.org/viewvc/llvm-project/polly/trunk/lib/
> CMakeLists.txt?rev=299359&r1=299358&r2=299359&view=diff
> ============================================================
> ==================
> --- polly/trunk/lib/CMakeLists.txt (original)
> +++ polly/trunk/lib/CMakeLists.txt Mon Apr  3 09:55:37 2017
> @@ -43,6 +43,7 @@ add_polly_library(Polly
>    CodeGen/Utils.cpp
>    CodeGen/RuntimeDebugBuilder.cpp
>    CodeGen/CodegenCleanup.cpp
> +  CodeGen/PerfMonitor.cpp
>    ${GPGPU_CODEGEN_FILES}
>    Exchange/JSONExporter.cpp
>    Support/GICHelper.cpp
>
> Modified: polly/trunk/lib/CodeGen/CodeGeneration.cpp
> URL: http://llvm.org/viewvc/llvm-project/polly/trunk/lib/
> CodeGen/CodeGeneration.cpp?rev=299359&r1=299358&r2=299359&view=diff
> ============================================================
> ==================
> --- polly/trunk/lib/CodeGen/CodeGeneration.cpp (original)
> +++ polly/trunk/lib/CodeGen/CodeGeneration.cpp Mon Apr  3 09:55:37 2017
> @@ -21,6 +21,7 @@
>
>  #include "polly/CodeGen/IslAst.h"
>  #include "polly/CodeGen/IslNodeBuilder.h"
> +#include "polly/CodeGen/PerfMonitor.h"
>  #include "polly/CodeGen/Utils.h"
>  #include "polly/DependenceInfo.h"
>  #include "polly/LinkAllPasses.h"
> @@ -45,6 +46,11 @@ static cl::opt<bool> Verify("polly-codeg
>                              cl::Hidden, cl::init(true), cl::ZeroOrMore,
>                              cl::cat(PollyCategory));
>
> +static cl::opt<bool>
> +    PerfMonitoring("polly-codegen-perf-monitoring",
> +                   cl::desc("Add run-time performance monitoring"),
> cl::Hidden,
> +                   cl::init(false), cl::ZeroOrMore,
> cl::cat(PollyCategory));
> +
>  namespace {
>  class CodeGeneration : public ScopPass {
>  public:
> @@ -145,6 +151,18 @@ public:
>      IslNodeBuilder NodeBuilder(Builder, Annotator, this, *DL, *LI, *SE,
> *DT, S,
>                                 StartBlock);
>
> +    if (PerfMonitoring) {
> +      PerfMonitor P(EnteringBB->getParent()->getParent());
> +      P.initialize();
> +      P.insertRegionStart(SplitBlock->getTerminator());
> +
> +      BasicBlock *MergeBlock = SplitBlock->getTerminator()
> +                                   ->getSuccessor(0)
> +                                   ->getUniqueSuccessor()
> +                                   ->getUniqueSuccessor();
> +      P.insertRegionEnd(MergeBlock->getTerminator());
> +    }
> +
>      // First generate code for the hoisted invariant loads and
> transitively the
>      // parameters they reference. Afterwards, for the remaining
> parameters that
>      // might reference the hoisted loads. Finally, build the runtime check
>
> Added: polly/trunk/lib/CodeGen/PerfMonitor.cpp
> URL: http://llvm.org/viewvc/llvm-project/polly/trunk/lib/
> CodeGen/PerfMonitor.cpp?rev=299359&view=auto
> ============================================================
> ==================
> --- polly/trunk/lib/CodeGen/PerfMonitor.cpp (added)
> +++ polly/trunk/lib/CodeGen/PerfMonitor.cpp Mon Apr  3 09:55:37 2017
> @@ -0,0 +1,235 @@
> +//===------ PerfMonitor.cpp - Generate a run-time performance monitor.
> -======//
> +//
> +//                     The LLVM Compiler Infrastructure
> +//
> +// This file is distributed under the University of Illinois Open Source
> +// License. See LICENSE.TXT for details.
> +//
> +//===------------------------------------------------------
> ----------------===//
> +//
> +//===------------------------------------------------------
> ----------------===//
> +
> +#include "polly/CodeGen/PerfMonitor.h"
> +#include "polly/CodeGen/RuntimeDebugBuilder.h"
> +#include "llvm/ADT/Triple.h"
> +
> +using namespace llvm;
> +using namespace polly;
> +
> +Function *PerfMonitor::getAtExit() {
> +  const char *Name = "atexit";
> +  Function *F = M->getFunction(Name);
> +
> +  if (!F) {
> +    GlobalValue::LinkageTypes Linkage = Function::ExternalLinkage;
> +    FunctionType *Ty = FunctionType::get(Builder.getInt32Ty(),
> +                                         {Builder.getInt8PtrTy()}, false);
> +    F = Function::Create(Ty, Linkage, Name, M);
> +  }
> +
> +  return F;
> +}
> +
> +void PerfMonitor::addToGlobalConstructors(Function *Fn) {
> +  const char *Name = "llvm.global_ctors";
> +  GlobalVariable *GV = M->getGlobalVariable(Name);
> +  std::vector<Constant *> V;
> +
> +  if (GV) {
> +    Constant *Array = GV->getInitializer();
> +    for (Value *X : Array->operand_values())
> +      V.push_back(cast<Constant>(X));
> +    GV->eraseFromParent();
> +  }
> +
> +  StructType *ST = StructType::get(Builder.getInt32Ty(), Fn->getType(),
> +                                   Builder.getInt8PtrTy(), nullptr);
> +
> +  V.push_back(ConstantStruct::get(
> +      ST, Builder.getInt32(10), Fn,
> +      ConstantPointerNull::get(Builder.getInt8PtrTy()), nullptr));
> +  ArrayType *Ty = ArrayType::get(ST, V.size());
> +
> +  GV = new GlobalVariable(*M, Ty, true, GlobalValue::AppendingLinkage,
> +                          ConstantArray::get(Ty, V), Name, nullptr,
> +                          GlobalVariable::NotThreadLocal);
> +}
> +
> +Function *PerfMonitor::getRDTSCP() {
> +  const char *Name = "llvm.x86.rdtscp";
> +  Function *F = M->getFunction(Name);
>
For intrinsics, we better use "auto *F = Intrinsics::get(M,
Intrinsics::x86_rdtscp);"


> +
> +  if (!F) {
> +    GlobalValue::LinkageTypes Linkage = Function::ExternalLinkage;
> +    FunctionType *Ty = FunctionType::get(Builder.getInt64Ty(),
> +                                         {Builder.getInt8PtrTy()}, false);
> +    F = Function::Create(Ty, Linkage, Name, M);
> +  }
> +
> +  return F;
> +}
> +
> +PerfMonitor::PerfMonitor(Module *M) : M(M), Builder(M->getContext()) {
> +  if (Triple(M->getTargetTriple()).getArch() == llvm::Triple::x86_64)
> +    Supported = true;
> +  else
> +    Supported = false;
> +}
> +
> +void PerfMonitor::addGlobalVariables() {
> +  auto TryRegisterGlobal = [=](const char *Name, Constant *InitialValue,
> +                               Value **Location) {
> +    *Location = M->getGlobalVariable(Name);
> +
> +    if (!*Location)
> +      *Location = new GlobalVariable(
> +          *M, InitialValue->getType(), true, GlobalValue::WeakAnyLinkage,
> +          InitialValue, Name, nullptr, GlobalVariable::
> InitialExecTLSModel);
> +  };
> +
> +  TryRegisterGlobal("__polly_perf_cycles_total_start",
> Builder.getInt64(0),
> +                    &CyclesTotalStartPtr);
> +
> +  TryRegisterGlobal("__polly_perf_initialized", Builder.getInt1(0),
> +                    &AlreadyInitializedPtr);
> +
> +  TryRegisterGlobal("__polly_perf_cycles_in_scops", Builder.getInt64(0),
> +                    &CyclesInScopsPtr);
> +
> +  TryRegisterGlobal("__polly_perf_cycles_in_scop_start",
> Builder.getInt64(0),
> +                    &CyclesInScopStartPtr);
> +
> +  TryRegisterGlobal("__polly_perf_write_loation", Builder.getInt32(0),
> +                    &RDTSCPWriteLocation);
> +}
> +
> +static const char *InitFunctionName = "__polly_perf_init";
> +static const char *FinalReportingFunctionName = "__polly_perf_final";
> +
> +Function *PerfMonitor::insertFinalReporting() {
> +  // Create new function.
> +  GlobalValue::LinkageTypes Linkage = Function::WeakODRLinkage;
> +  FunctionType *Ty = FunctionType::get(Builder.getVoidTy(), {}, false);
> +  Function *ExitFn =
> +      Function::Create(Ty, Linkage, FinalReportingFunctionName, M);
> +  BasicBlock *Start = BasicBlock::Create(M->getContext(), "start",
> ExitFn);
> +  Builder.SetInsertPoint(Start);
> +
> +  if (!Supported) {
> +    RuntimeDebugBuilder::createCPUPrinter(
> +        Builder, "Polly runtime information generation not supported\n");
> +    Builder.CreateRetVoid();
> +    return ExitFn;
> +  }
> +
> +  // Measure current cycles and compute final timings.
> +  Function *RDTSCPFn = getRDTSCP();
> +  Value *CurrentCycles = Builder.CreateCall(
> +      RDTSCPFn,
> +      Builder.CreatePointerCast(RDTSCPWriteLocation,
> Builder.getInt8PtrTy()));
> +  Value *CyclesStart = Builder.CreateLoad(CyclesTotalStartPtr, true);
> +  Value *CyclesTotal = Builder.CreateSub(CurrentCycles, CyclesStart);
> +  Value *CyclesInScops = Builder.CreateLoad(CyclesInScopsPtr, true);
> +
> +  // Print the runtime information.
> +  RuntimeDebugBuilder::createCPUPrinter(Builder, "Polly runtime
> information\n");
> +  RuntimeDebugBuilder::createCPUPrinter(Builder,
> "-------------------------\n");
> +  RuntimeDebugBuilder::createCPUPrinter(Builder, "Total: ", CyclesTotal,
> "\n");
> +  RuntimeDebugBuilder::createCPUPrinter(Builder, "Scops: ",
> CyclesInScops,
> +                                        "\n");
> +
> +  // Finalize function.
> +  Builder.CreateRetVoid();
> +  return ExitFn;
> +}
> +
> +void PerfMonitor::initialize() {
> +  addGlobalVariables();
> +
> +  Function *F = M->getFunction(InitFunctionName);
> +  if (F)
> +    return;
> +
> +  // initialize
> +  Function *FinalReporting = insertFinalReporting();
> +  Function *InitFn = insertInitFunction(FinalReporting);
> +  addToGlobalConstructors(InitFn);
> +}
> +
> +Function *PerfMonitor::insertInitFunction(Function *FinalReporting) {
> +  // Insert function definition and BBs.
> +  GlobalValue::LinkageTypes Linkage = Function::WeakODRLinkage;
> +  FunctionType *Ty = FunctionType::get(Builder.getVoidTy(), {}, false);
> +  Function *InitFn = Function::Create(Ty, Linkage, InitFunctionName, M);
> +  BasicBlock *Start = BasicBlock::Create(M->getContext(), "start",
> InitFn);
> +  BasicBlock *EarlyReturn =
> +      BasicBlock::Create(M->getContext(), "earlyreturn", InitFn);
> +  BasicBlock *InitBB = BasicBlock::Create(M->getContext(), "initbb",
> InitFn);
> +
> +  Builder.SetInsertPoint(Start);
> +
> +  // Check if this function was already run. If yes, return.
> +  //
> +  // In case profiling has been enabled in multiple translation units, the
> +  // initializer function will be added to the global constructors list of
> +  // each translation unit. When merging translation units, the global
> +  // constructor lists are just appended, such that the initializer will
> appear
> +  // multiple times. To avoid initializations being run multiple times
> (and
> +  // especially to avoid that atExitFn is called more than once), we bail
> +  // out if the intializer is run more than once.
> +  Value *HasRunBefore = Builder.CreateLoad(AlreadyInitializedPtr);
> +  Builder.CreateCondBr(HasRunBefore, EarlyReturn, InitBB);
> +  Builder.SetInsertPoint(EarlyReturn);
> +  Builder.CreateRetVoid();
> +
> +  // Keep track that this function has been run once.
> +  Builder.SetInsertPoint(InitBB);
> +  Value *True = Builder.getInt1(true);
> +  Builder.CreateStore(True, AlreadyInitializedPtr);
> +
> +  // Register the final reporting function with atexit().
> +  Value *FinalReportingPtr =
> +      Builder.CreatePointerCast(FinalReporting, Builder.getInt8PtrTy());
> +  Function *AtExitFn = getAtExit();
> +  Builder.CreateCall(AtExitFn, {FinalReportingPtr});
> +
> +  if (Supported) {
> +    // Read the currently cycle counter and store the result for later.
> +    Function *RDTSCPFn = getRDTSCP();
> +    Value *CurrentCycles = Builder.CreateCall(
> +        RDTSCPFn,
> +        Builder.CreatePointerCast(RDTSCPWriteLocation,
> Builder.getInt8PtrTy()));
> +    Builder.CreateStore(CurrentCycles, CyclesTotalStartPtr, true);
> +  }
> +  Builder.CreateRetVoid();
> +
> +  return InitFn;
> +}
> +
> +void PerfMonitor::insertRegionStart(Instruction *InsertBefore) {
> +  if (!Supported)
> +    return;
> +
> +  Builder.SetInsertPoint(InsertBefore);
> +  Function *RDTSCPFn = getRDTSCP();
> +  Value *CurrentCycles = Builder.CreateCall(
> +      RDTSCPFn,
> +      Builder.CreatePointerCast(RDTSCPWriteLocation,
> Builder.getInt8PtrTy()));
> +  Builder.CreateStore(CurrentCycles, CyclesInScopStartPtr, true);
> +}
> +
> +void PerfMonitor::insertRegionEnd(Instruction *InsertBefore) {
> +  if (!Supported)
> +    return;
> +
> +  Builder.SetInsertPoint(InsertBefore);
> +  Function *RDTSCPFn = getRDTSCP();
> +  LoadInst *CyclesStart = Builder.CreateLoad(CyclesInScopStartPtr, true);
> +  Value *CurrentCycles = Builder.CreateCall(
> +      RDTSCPFn,
> +      Builder.CreatePointerCast(RDTSCPWriteLocation,
> Builder.getInt8PtrTy()));
> +  Value *CyclesInScop = Builder.CreateSub(CurrentCycles, CyclesStart);
> +  Value *CyclesInScops = Builder.CreateLoad(CyclesInScopsPtr, true);
> +  CyclesInScops = Builder.CreateAdd(CyclesInScops, CyclesInScop);
> +  Builder.CreateStore(CyclesInScops, CyclesInScopsPtr, true);
> +}
>
> Added: polly/trunk/test/Isl/CodeGen/perf_monitoring.ll
> URL: http://llvm.org/viewvc/llvm-project/polly/trunk/test/Isl/
> CodeGen/perf_monitoring.ll?rev=299359&view=auto
> ============================================================
> ==================
> --- polly/trunk/test/Isl/CodeGen/perf_monitoring.ll (added)
> +++ polly/trunk/test/Isl/CodeGen/perf_monitoring.ll Mon Apr  3 09:55:37
> 2017
> @@ -0,0 +1,87 @@
> +; RUN: opt %loadPolly -polly-codegen -polly-codegen-perf-monitoring \
> +; RUN:   -S < %s | FileCheck %s
> +
> +; void f(long A[], long N) {
> +;   long i;
> +;   if (true)
> +;     for (i = 0; i < N; ++i)
> +;       A[i] = i;
> +; }
> +
> +target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-
> i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-
> v128:128:128-a0:0:64-s0:64:64-f80:128:128"
> +target triple = "x86_64-unknown-linux-gnu"
> +
> +define void @f(i64* %A, i64 %N) nounwind {
> +entry:
> +  fence seq_cst
> +  br label %next
> +
> +next:
> +  br i1 true, label %for.i, label %return
> +
> +for.i:
> +  %indvar = phi i64 [ 0, %next], [ %indvar.next, %for.i ]
> +  %scevgep = getelementptr i64, i64* %A, i64 %indvar
> +  store i64 %indvar, i64* %scevgep
> +  %indvar.next = add nsw i64 %indvar, 1
> +  %exitcond = icmp eq i64 %indvar.next, %N
> +  br i1 %exitcond, label %return, label %for.i
> +
> +return:
> +  fence seq_cst
> +  ret void
> +}
> +
> +; CHECK:      @__polly_perf_cycles_total_start = weak
> thread_local(initialexec) constant i64 0
> +; CHECK-NEXT: @__polly_perf_initialized = weak thread_local(initialexec)
> constant i1 false
> +; CHECK-NEXT: @__polly_perf_cycles_in_scops = weak
> thread_local(initialexec) constant i64 0
> +; CHECK-NEXT: @__polly_perf_cycles_in_scop_start = weak
> thread_local(initialexec) constant i64 0
> +; CHECK-NEXT: @__polly_perf_write_loation = weak
> thread_local(initialexec) constant i32 0
> +
> +; CHECK:      polly.split_new_and_old:                          ; preds =
> %entry
> +; CHECK-NEXT:   %0 = call i64 @llvm.x86.rdtscp(i8* bitcast (i32*
> @__polly_perf_write_loation to i8*))
> +; CHECK-NEXT:   store volatile i64 %0, i64* @__polly_perf_cycles_in_scop_
> start
> +
> +; CHECK:      polly.merge_new_and_old:                          ; preds =
> %polly.exiting, %return.region_exiting
> +; CHECK-NEXT:   %5 = load volatile i64, i64* @__polly_perf_cycles_in_scop_
> start
> +; CHECK-NEXT:   %6 = call i64 @llvm.x86.rdtscp(i8* bitcast (i32*
> @__polly_perf_write_loation to i8*))
> +; CHECK-NEXT:   %7 = sub i64 %6, %5
> +; CHECK-NEXT:   %8 = load volatile i64, i64* @__polly_perf_cycles_in_scops
> +; CHECK-NEXT:   %9 = add i64 %8, %7
> +; CHECK-NEXT:   store volatile i64 %9, i64* @__polly_perf_cycles_in_scops
> +; CHECK-NEXT:   br label %return
> +
> +
> +; CHECK:      define weak_odr void @__polly_perf_final() {
> +; CHECK-NEXT: start:
> +; CHECK-NEXT:   %0 = call i64 @llvm.x86.rdtscp(i8* bitcast (i32*
> @__polly_perf_write_loation to i8*))
> +; CHECK-NEXT:   %1 = load volatile i64, i64* @__polly_perf_cycles_total_
> start
> +; CHECK-NEXT:   %2 = sub i64 %0, %1
> +; CHECK-NEXT:   %3 = load volatile i64, i64* @__polly_perf_cycles_in_scops
> +; CHECK-NEXT:   %4 = call i32 (...) @printf(i8* getelementptr inbounds
> ([3 x i8], [3 x i8]* @1, i32 0, i32 0), i8 addrspace(4)* getelementptr
> inbounds ([27 x i8], [27 x i8] addrspace(4)* @0, i32 0, i32 0))
> +; CHECK-NEXT:   %5 = call i32 @fflush(i8* null)
> +; CHECK-NEXT:   %6 = call i32 (...) @printf(i8* getelementptr inbounds
> ([3 x i8], [3 x i8]* @3, i32 0, i32 0), i8 addrspace(4)* getelementptr
> inbounds ([27 x i8], [27 x i8] addrspace(4)* @2, i32 0, i32 0))
> +; CHECK-NEXT:   %7 = call i32 @fflush(i8* null)
> +; CHECK-NEXT:   %8 = call i32 (...) @printf(i8* getelementptr inbounds
> ([8 x i8], [8 x i8]* @6, i32 0, i32 0), i8 addrspace(4)* getelementptr
> inbounds ([8 x i8], [8 x i8] addrspace(4)* @4, i32 0, i32 0), i64 %2, i8
> addrspace(4)* getelementptr inbounds ([2 x i8], [2 x i8] addrspace(4)* @5,
> i32 0, i32 0))
> +; CHECK-NEXT:   %9 = call i32 @fflush(i8* null)
> +; CHECK-NEXT:   %10 = call i32 (...) @printf(i8* getelementptr inbounds
> ([8 x i8], [8 x i8]* @9, i32 0, i32 0), i8 addrspace(4)* getelementptr
> inbounds ([8 x i8], [8 x i8] addrspace(4)* @7, i32 0, i32 0), i64 %3, i8
> addrspace(4)* getelementptr inbounds ([2 x i8], [2 x i8] addrspace(4)* @8,
> i32 0, i32 0))
> +; CHECK-NEXT:   %11 = call i32 @fflush(i8* null)
> +; CHECK-NEXT:   ret void
> +; CHECK-NEXT: }
> +
> +
> +; CHECK:      define weak_odr void @__polly_perf_init() {
> +; CHECK-NEXT: start:
> +; CHECK-NEXT:   %0 = load i1, i1* @__polly_perf_initialized
> +; CHECK-NEXT:   br i1 %0, label %earlyreturn, label %initbb
> +
> +; CHECK:      earlyreturn:                                      ; preds =
> %start
> +; CHECK-NEXT:   ret void
> +
> +; CHECK:      initbb:                                           ; preds =
> %start
> +; CHECK-NEXT:   store i1 true, i1* @__polly_perf_initialized
> +; CHECK-NEXT:   %1 = call i32 @atexit(i8* bitcast (void ()*
> @__polly_perf_final to i8*))
> +; CHECK-NEXT:   %2 = call i64 @llvm.x86.rdtscp(i8* bitcast (i32*
> @__polly_perf_write_loation to i8*))
> +; CHECK-NEXT:   store volatile i64 %2, i64* @__polly_perf_cycles_total_
> start
> +; CHECK-NEXT:   ret void
> +; CHECK-NEXT: }
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170403/26b442eb/attachment.html>


More information about the llvm-commits mailing list