[polly] r299359 - [CodeGen] Add Performance Monitor
Tobias Grosser via llvm-commits
llvm-commits at lists.llvm.org
Mon Apr 3 08:35:50 PDT 2017
Good point: r299360
Best,
Tobias
On Mon, Apr 3, 2017, at 05:27 PM, Hongbin Zheng wrote:
> On Mon, Apr 3, 2017 at 7:55 AM, Tobias Grosser via llvm-commits <
> llvm-commits at lists.llvm.org> wrote:
>
> > Author: grosser
> > Date: Mon Apr 3 09:55:37 2017
> > New Revision: 299359
> >
> > URL: http://llvm.org/viewvc/llvm-project?rev=299359&view=rev
> > Log:
> > [CodeGen] Add Performance Monitor
> >
> > Add support for -polly-codegen-perf-monitoring. When performance
> > monitoring
> > is enabled, we emit performance monitoring code during code generation that
> > prints after program exit statistics about the total number of cycles
> > executed
> > as well as the number of cycles spent in scops. This gives an estimate on
> > how
> > useful polyhedral optimizations might be for a given program.
> >
> > Example output:
> >
> > Polly runtime information
> > -------------------------
> > Total: 783110081637
> > Scops: 663718949365
> >
> > In the future, we might also add functionality to measure how much time is
> > spent
> > in optimized scops and how many cycles are spent in the fallback code.
> >
> > Reviewers: bollu,sebpop
> >
> > Tags: #polly
> >
> > Differential Revision: https://reviews.llvm.org/D31599
> >
> > Added:
> > polly/trunk/include/polly/CodeGen/PerfMonitor.h
> > polly/trunk/lib/CodeGen/PerfMonitor.cpp
> > polly/trunk/test/Isl/CodeGen/perf_monitoring.ll
> > Modified:
> > polly/trunk/lib/CMakeLists.txt
> > polly/trunk/lib/CodeGen/CodeGeneration.cpp
> >
> > Added: polly/trunk/include/polly/CodeGen/PerfMonitor.h
> > URL: http://llvm.org/viewvc/llvm-project/polly/trunk/include/
> > polly/CodeGen/PerfMonitor.h?rev=299359&view=auto
> > ============================================================
> > ==================
> > --- polly/trunk/include/polly/CodeGen/PerfMonitor.h (added)
> > +++ polly/trunk/include/polly/CodeGen/PerfMonitor.h Mon Apr 3 09:55:37
> > 2017
> > @@ -0,0 +1,132 @@
> > +//===--- PerfMonitor.h --- Monitor time spent in scops
> > --------------------===//
> > +//
> > +// The LLVM Compiler Infrastructure
> > +//
> > +// This file is distributed under the University of Illinois Open Source
> > +// License. See LICENSE.TXT for details.
> > +//
> > +//===------------------------------------------------------
> > ----------------===//
> > +
> > +#ifndef PERF_MONITOR_H
> > +#define PERF_MONITOR_H
> > +
> > +#include "polly/CodeGen/IRBuilder.h"
> > +
> > +namespace llvm {
> > +class Function;
> > +class Module;
> > +class Value;
> > +class Instruction;
> > +} // namespace llvm
> > +
> > +namespace polly {
> > +
> > +class PerfMonitor {
> > +public:
> > + /// Create a new performance monitor.
> > + ///
> > + /// @param M The module for which to generate the performance monitor.
> > + PerfMonitor(llvm::Module *M);
> > +
> > + /// Initialize the performance monitor.
> > + ///
> > + /// Ensure that all global variables, functions, and callbacks needed to
> > + /// manage the performance monitor are initialized and registered.
> > + void initialize();
> > +
> > + /// Mark the beginning of a timing region.
> > + ///
> > + /// @param InsertBefore The instruction before which the timing region
> > starts.
> > + void insertRegionStart(llvm::Instruction *InserBefore);
> > +
> > + /// Mark the end of a timing region.
> > + ///
> > + /// @param InsertBefore The instruction before which the timing region
> > starts.
> > + void insertRegionEnd(llvm::Instruction *InsertBefore);
> > +
> > +private:
> > + llvm::Module *M;
> > + PollyIRBuilder Builder;
> > +
> > + /// Indicates if performance profiling is supported on this
> > architecture.
> > + bool Supported;
> > +
> > + /// The cycle counter at the beginning of the program execution.
> > + llvm::Value *CyclesTotalStartPtr;
> > +
> > + /// The total number of cycles spent within scops.
> > + llvm::Value *CyclesInScopsPtr;
> > +
> > + /// The value of the cycle counter at the beginning of the last scop.
> > + llvm::Value *CyclesInScopStartPtr;
> > +
> > + /// A memory location which serves as argument of the RDTSCP function.
> > + ///
> > + /// The value written to this location is currently not used.
> > + llvm::Value *RDTSCPWriteLocation;
> > +
> > + /// A global variable, that keeps track if the performance monitor
> > + /// initialization has already been run.
> > + llvm::Value *AlreadyInitializedPtr;
> > +
> > + llvm::Function *insertInitFunction(llvm::Function *FinalReporting);
> > +
> > + /// Add Function @p to list of global constructors
> > + ///
> > + /// If no global constructors are available in this current module,
> > insert
> > + /// a new list of global constructors containing @p Fn as only global
> > + /// constructor. Otherwise, append @p Fn to the list of global
> > constructors.
> > + ///
> > + /// All functions listed as global constructors are executed before the
> > + /// main() function is called.
> > + ///
> > + /// @param Fn Function to add to global constructors
> > + void addToGlobalConstructors(llvm::Function *Fn);
> > +
> > + /// Add global variables to module.
> > + ///
> > + /// Insert a set of global variables that are used to track performance,
> > + /// into the module (or obtain references to them if they already
> > exist).
> > + void addGlobalVariables();
> > +
> > + /// Get a reference to the intrinsic "i64 @llvm.x86.rdtscp(i8*)".
> > + ///
> > + /// The rdtscp function returns the current value of the processor's
> > + /// time-stamp counter as well as the current CPU identifier. On modern
> > x86
> > + /// systems, the returned value is independent of the dynamic clock
> > frequency
> > + /// and consistent across multiple cores. It can consequently be used
> > to get
> > + /// accurate and low-overhead timing information. Even though the
> > counter is
> > + /// wrapping, it can be reliably used even for measuring longer time
> > + /// intervals, as on a 1 GHz processor the counter only wraps every 545
> > years.
> > + ///
> > + /// The RDTSCP instruction is "pseudo" serializing:
> > + ///
> > + /// "“The RDTSCP instruction waits until all previous instructions
> > have been
> > + /// executed before reading the counter. However, subsequent
> > instructions may
> > + /// begin execution before the read operation is performed.â€
> > + ///
> > + /// To ensure that no later instructions are scheduled before the RDTSCP
> > + /// instruction it is often recommended to schedule a cpuid call after
> > the
> > + /// RDTSCP instruction. We do not do this yet, trading some imprecision
> > in
> > + /// our timing for a reduced overhead in our timing.
> > + ///
> > + /// @returns A reference to the declaration of @llvm.x86.rdtscp.
> > + llvm::Function *getRDTSCP();
> > +
> > + /// Get a reference to "int atexit(void (*function)(void))" function.
> > + ///
> > + /// This function allows to register function pointers that must be
> > executed
> > + /// when the program is terminated.
> > + ///
> > + /// @returns A reference to @atexit().
> > + llvm::Function *getAtExit();
> > +
> > + /// Create function "__polly_perf_final_reporting".
> > + ///
> > + /// This function finalizes the performance measurements and prints the
> > + /// results to stdout. It is expected to be registered with 'atexit()'.
> > + llvm::Function *insertFinalReporting();
> > +};
> > +} // namespace polly
> > +
> > +#endif
> >
> > Modified: polly/trunk/lib/CMakeLists.txt
> > URL: http://llvm.org/viewvc/llvm-project/polly/trunk/lib/
> > CMakeLists.txt?rev=299359&r1=299358&r2=299359&view=diff
> > ============================================================
> > ==================
> > --- polly/trunk/lib/CMakeLists.txt (original)
> > +++ polly/trunk/lib/CMakeLists.txt Mon Apr 3 09:55:37 2017
> > @@ -43,6 +43,7 @@ add_polly_library(Polly
> > CodeGen/Utils.cpp
> > CodeGen/RuntimeDebugBuilder.cpp
> > CodeGen/CodegenCleanup.cpp
> > + CodeGen/PerfMonitor.cpp
> > ${GPGPU_CODEGEN_FILES}
> > Exchange/JSONExporter.cpp
> > Support/GICHelper.cpp
> >
> > Modified: polly/trunk/lib/CodeGen/CodeGeneration.cpp
> > URL: http://llvm.org/viewvc/llvm-project/polly/trunk/lib/
> > CodeGen/CodeGeneration.cpp?rev=299359&r1=299358&r2=299359&view=diff
> > ============================================================
> > ==================
> > --- polly/trunk/lib/CodeGen/CodeGeneration.cpp (original)
> > +++ polly/trunk/lib/CodeGen/CodeGeneration.cpp Mon Apr 3 09:55:37 2017
> > @@ -21,6 +21,7 @@
> >
> > #include "polly/CodeGen/IslAst.h"
> > #include "polly/CodeGen/IslNodeBuilder.h"
> > +#include "polly/CodeGen/PerfMonitor.h"
> > #include "polly/CodeGen/Utils.h"
> > #include "polly/DependenceInfo.h"
> > #include "polly/LinkAllPasses.h"
> > @@ -45,6 +46,11 @@ static cl::opt<bool> Verify("polly-codeg
> > cl::Hidden, cl::init(true), cl::ZeroOrMore,
> > cl::cat(PollyCategory));
> >
> > +static cl::opt<bool>
> > + PerfMonitoring("polly-codegen-perf-monitoring",
> > + cl::desc("Add run-time performance monitoring"),
> > cl::Hidden,
> > + cl::init(false), cl::ZeroOrMore,
> > cl::cat(PollyCategory));
> > +
> > namespace {
> > class CodeGeneration : public ScopPass {
> > public:
> > @@ -145,6 +151,18 @@ public:
> > IslNodeBuilder NodeBuilder(Builder, Annotator, this, *DL, *LI, *SE,
> > *DT, S,
> > StartBlock);
> >
> > + if (PerfMonitoring) {
> > + PerfMonitor P(EnteringBB->getParent()->getParent());
> > + P.initialize();
> > + P.insertRegionStart(SplitBlock->getTerminator());
> > +
> > + BasicBlock *MergeBlock = SplitBlock->getTerminator()
> > + ->getSuccessor(0)
> > + ->getUniqueSuccessor()
> > + ->getUniqueSuccessor();
> > + P.insertRegionEnd(MergeBlock->getTerminator());
> > + }
> > +
> > // First generate code for the hoisted invariant loads and
> > transitively the
> > // parameters they reference. Afterwards, for the remaining
> > parameters that
> > // might reference the hoisted loads. Finally, build the runtime check
> >
> > Added: polly/trunk/lib/CodeGen/PerfMonitor.cpp
> > URL: http://llvm.org/viewvc/llvm-project/polly/trunk/lib/
> > CodeGen/PerfMonitor.cpp?rev=299359&view=auto
> > ============================================================
> > ==================
> > --- polly/trunk/lib/CodeGen/PerfMonitor.cpp (added)
> > +++ polly/trunk/lib/CodeGen/PerfMonitor.cpp Mon Apr 3 09:55:37 2017
> > @@ -0,0 +1,235 @@
> > +//===------ PerfMonitor.cpp - Generate a run-time performance monitor.
> > -======//
> > +//
> > +// The LLVM Compiler Infrastructure
> > +//
> > +// This file is distributed under the University of Illinois Open Source
> > +// License. See LICENSE.TXT for details.
> > +//
> > +//===------------------------------------------------------
> > ----------------===//
> > +//
> > +//===------------------------------------------------------
> > ----------------===//
> > +
> > +#include "polly/CodeGen/PerfMonitor.h"
> > +#include "polly/CodeGen/RuntimeDebugBuilder.h"
> > +#include "llvm/ADT/Triple.h"
> > +
> > +using namespace llvm;
> > +using namespace polly;
> > +
> > +Function *PerfMonitor::getAtExit() {
> > + const char *Name = "atexit";
> > + Function *F = M->getFunction(Name);
> > +
> > + if (!F) {
> > + GlobalValue::LinkageTypes Linkage = Function::ExternalLinkage;
> > + FunctionType *Ty = FunctionType::get(Builder.getInt32Ty(),
> > + {Builder.getInt8PtrTy()}, false);
> > + F = Function::Create(Ty, Linkage, Name, M);
> > + }
> > +
> > + return F;
> > +}
> > +
> > +void PerfMonitor::addToGlobalConstructors(Function *Fn) {
> > + const char *Name = "llvm.global_ctors";
> > + GlobalVariable *GV = M->getGlobalVariable(Name);
> > + std::vector<Constant *> V;
> > +
> > + if (GV) {
> > + Constant *Array = GV->getInitializer();
> > + for (Value *X : Array->operand_values())
> > + V.push_back(cast<Constant>(X));
> > + GV->eraseFromParent();
> > + }
> > +
> > + StructType *ST = StructType::get(Builder.getInt32Ty(), Fn->getType(),
> > + Builder.getInt8PtrTy(), nullptr);
> > +
> > + V.push_back(ConstantStruct::get(
> > + ST, Builder.getInt32(10), Fn,
> > + ConstantPointerNull::get(Builder.getInt8PtrTy()), nullptr));
> > + ArrayType *Ty = ArrayType::get(ST, V.size());
> > +
> > + GV = new GlobalVariable(*M, Ty, true, GlobalValue::AppendingLinkage,
> > + ConstantArray::get(Ty, V), Name, nullptr,
> > + GlobalVariable::NotThreadLocal);
> > +}
> > +
> > +Function *PerfMonitor::getRDTSCP() {
> > + const char *Name = "llvm.x86.rdtscp";
> > + Function *F = M->getFunction(Name);
> >
> For intrinsics, we better use "auto *F = Intrinsics::get(M,
> Intrinsics::x86_rdtscp);"
>
>
> > +
> > + if (!F) {
> > + GlobalValue::LinkageTypes Linkage = Function::ExternalLinkage;
> > + FunctionType *Ty = FunctionType::get(Builder.getInt64Ty(),
> > + {Builder.getInt8PtrTy()}, false);
> > + F = Function::Create(Ty, Linkage, Name, M);
> > + }
> > +
> > + return F;
> > +}
> > +
> > +PerfMonitor::PerfMonitor(Module *M) : M(M), Builder(M->getContext()) {
> > + if (Triple(M->getTargetTriple()).getArch() == llvm::Triple::x86_64)
> > + Supported = true;
> > + else
> > + Supported = false;
> > +}
> > +
> > +void PerfMonitor::addGlobalVariables() {
> > + auto TryRegisterGlobal = [=](const char *Name, Constant *InitialValue,
> > + Value **Location) {
> > + *Location = M->getGlobalVariable(Name);
> > +
> > + if (!*Location)
> > + *Location = new GlobalVariable(
> > + *M, InitialValue->getType(), true, GlobalValue::WeakAnyLinkage,
> > + InitialValue, Name, nullptr, GlobalVariable::
> > InitialExecTLSModel);
> > + };
> > +
> > + TryRegisterGlobal("__polly_perf_cycles_total_start",
> > Builder.getInt64(0),
> > + &CyclesTotalStartPtr);
> > +
> > + TryRegisterGlobal("__polly_perf_initialized", Builder.getInt1(0),
> > + &AlreadyInitializedPtr);
> > +
> > + TryRegisterGlobal("__polly_perf_cycles_in_scops", Builder.getInt64(0),
> > + &CyclesInScopsPtr);
> > +
> > + TryRegisterGlobal("__polly_perf_cycles_in_scop_start",
> > Builder.getInt64(0),
> > + &CyclesInScopStartPtr);
> > +
> > + TryRegisterGlobal("__polly_perf_write_loation", Builder.getInt32(0),
> > + &RDTSCPWriteLocation);
> > +}
> > +
> > +static const char *InitFunctionName = "__polly_perf_init";
> > +static const char *FinalReportingFunctionName = "__polly_perf_final";
> > +
> > +Function *PerfMonitor::insertFinalReporting() {
> > + // Create new function.
> > + GlobalValue::LinkageTypes Linkage = Function::WeakODRLinkage;
> > + FunctionType *Ty = FunctionType::get(Builder.getVoidTy(), {}, false);
> > + Function *ExitFn =
> > + Function::Create(Ty, Linkage, FinalReportingFunctionName, M);
> > + BasicBlock *Start = BasicBlock::Create(M->getContext(), "start",
> > ExitFn);
> > + Builder.SetInsertPoint(Start);
> > +
> > + if (!Supported) {
> > + RuntimeDebugBuilder::createCPUPrinter(
> > + Builder, "Polly runtime information generation not supported\n");
> > + Builder.CreateRetVoid();
> > + return ExitFn;
> > + }
> > +
> > + // Measure current cycles and compute final timings.
> > + Function *RDTSCPFn = getRDTSCP();
> > + Value *CurrentCycles = Builder.CreateCall(
> > + RDTSCPFn,
> > + Builder.CreatePointerCast(RDTSCPWriteLocation,
> > Builder.getInt8PtrTy()));
> > + Value *CyclesStart = Builder.CreateLoad(CyclesTotalStartPtr, true);
> > + Value *CyclesTotal = Builder.CreateSub(CurrentCycles, CyclesStart);
> > + Value *CyclesInScops = Builder.CreateLoad(CyclesInScopsPtr, true);
> > +
> > + // Print the runtime information.
> > + RuntimeDebugBuilder::createCPUPrinter(Builder, "Polly runtime
> > information\n");
> > + RuntimeDebugBuilder::createCPUPrinter(Builder,
> > "-------------------------\n");
> > + RuntimeDebugBuilder::createCPUPrinter(Builder, "Total: ", CyclesTotal,
> > "\n");
> > + RuntimeDebugBuilder::createCPUPrinter(Builder, "Scops: ",
> > CyclesInScops,
> > + "\n");
> > +
> > + // Finalize function.
> > + Builder.CreateRetVoid();
> > + return ExitFn;
> > +}
> > +
> > +void PerfMonitor::initialize() {
> > + addGlobalVariables();
> > +
> > + Function *F = M->getFunction(InitFunctionName);
> > + if (F)
> > + return;
> > +
> > + // initialize
> > + Function *FinalReporting = insertFinalReporting();
> > + Function *InitFn = insertInitFunction(FinalReporting);
> > + addToGlobalConstructors(InitFn);
> > +}
> > +
> > +Function *PerfMonitor::insertInitFunction(Function *FinalReporting) {
> > + // Insert function definition and BBs.
> > + GlobalValue::LinkageTypes Linkage = Function::WeakODRLinkage;
> > + FunctionType *Ty = FunctionType::get(Builder.getVoidTy(), {}, false);
> > + Function *InitFn = Function::Create(Ty, Linkage, InitFunctionName, M);
> > + BasicBlock *Start = BasicBlock::Create(M->getContext(), "start",
> > InitFn);
> > + BasicBlock *EarlyReturn =
> > + BasicBlock::Create(M->getContext(), "earlyreturn", InitFn);
> > + BasicBlock *InitBB = BasicBlock::Create(M->getContext(), "initbb",
> > InitFn);
> > +
> > + Builder.SetInsertPoint(Start);
> > +
> > + // Check if this function was already run. If yes, return.
> > + //
> > + // In case profiling has been enabled in multiple translation units, the
> > + // initializer function will be added to the global constructors list of
> > + // each translation unit. When merging translation units, the global
> > + // constructor lists are just appended, such that the initializer will
> > appear
> > + // multiple times. To avoid initializations being run multiple times
> > (and
> > + // especially to avoid that atExitFn is called more than once), we bail
> > + // out if the intializer is run more than once.
> > + Value *HasRunBefore = Builder.CreateLoad(AlreadyInitializedPtr);
> > + Builder.CreateCondBr(HasRunBefore, EarlyReturn, InitBB);
> > + Builder.SetInsertPoint(EarlyReturn);
> > + Builder.CreateRetVoid();
> > +
> > + // Keep track that this function has been run once.
> > + Builder.SetInsertPoint(InitBB);
> > + Value *True = Builder.getInt1(true);
> > + Builder.CreateStore(True, AlreadyInitializedPtr);
> > +
> > + // Register the final reporting function with atexit().
> > + Value *FinalReportingPtr =
> > + Builder.CreatePointerCast(FinalReporting, Builder.getInt8PtrTy());
> > + Function *AtExitFn = getAtExit();
> > + Builder.CreateCall(AtExitFn, {FinalReportingPtr});
> > +
> > + if (Supported) {
> > + // Read the currently cycle counter and store the result for later.
> > + Function *RDTSCPFn = getRDTSCP();
> > + Value *CurrentCycles = Builder.CreateCall(
> > + RDTSCPFn,
> > + Builder.CreatePointerCast(RDTSCPWriteLocation,
> > Builder.getInt8PtrTy()));
> > + Builder.CreateStore(CurrentCycles, CyclesTotalStartPtr, true);
> > + }
> > + Builder.CreateRetVoid();
> > +
> > + return InitFn;
> > +}
> > +
> > +void PerfMonitor::insertRegionStart(Instruction *InsertBefore) {
> > + if (!Supported)
> > + return;
> > +
> > + Builder.SetInsertPoint(InsertBefore);
> > + Function *RDTSCPFn = getRDTSCP();
> > + Value *CurrentCycles = Builder.CreateCall(
> > + RDTSCPFn,
> > + Builder.CreatePointerCast(RDTSCPWriteLocation,
> > Builder.getInt8PtrTy()));
> > + Builder.CreateStore(CurrentCycles, CyclesInScopStartPtr, true);
> > +}
> > +
> > +void PerfMonitor::insertRegionEnd(Instruction *InsertBefore) {
> > + if (!Supported)
> > + return;
> > +
> > + Builder.SetInsertPoint(InsertBefore);
> > + Function *RDTSCPFn = getRDTSCP();
> > + LoadInst *CyclesStart = Builder.CreateLoad(CyclesInScopStartPtr, true);
> > + Value *CurrentCycles = Builder.CreateCall(
> > + RDTSCPFn,
> > + Builder.CreatePointerCast(RDTSCPWriteLocation,
> > Builder.getInt8PtrTy()));
> > + Value *CyclesInScop = Builder.CreateSub(CurrentCycles, CyclesStart);
> > + Value *CyclesInScops = Builder.CreateLoad(CyclesInScopsPtr, true);
> > + CyclesInScops = Builder.CreateAdd(CyclesInScops, CyclesInScop);
> > + Builder.CreateStore(CyclesInScops, CyclesInScopsPtr, true);
> > +}
> >
> > Added: polly/trunk/test/Isl/CodeGen/perf_monitoring.ll
> > URL: http://llvm.org/viewvc/llvm-project/polly/trunk/test/Isl/
> > CodeGen/perf_monitoring.ll?rev=299359&view=auto
> > ============================================================
> > ==================
> > --- polly/trunk/test/Isl/CodeGen/perf_monitoring.ll (added)
> > +++ polly/trunk/test/Isl/CodeGen/perf_monitoring.ll Mon Apr 3 09:55:37
> > 2017
> > @@ -0,0 +1,87 @@
> > +; RUN: opt %loadPolly -polly-codegen -polly-codegen-perf-monitoring \
> > +; RUN: -S < %s | FileCheck %s
> > +
> > +; void f(long A[], long N) {
> > +; long i;
> > +; if (true)
> > +; for (i = 0; i < N; ++i)
> > +; A[i] = i;
> > +; }
> > +
> > +target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-
> > i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-
> > v128:128:128-a0:0:64-s0:64:64-f80:128:128"
> > +target triple = "x86_64-unknown-linux-gnu"
> > +
> > +define void @f(i64* %A, i64 %N) nounwind {
> > +entry:
> > + fence seq_cst
> > + br label %next
> > +
> > +next:
> > + br i1 true, label %for.i, label %return
> > +
> > +for.i:
> > + %indvar = phi i64 [ 0, %next], [ %indvar.next, %for.i ]
> > + %scevgep = getelementptr i64, i64* %A, i64 %indvar
> > + store i64 %indvar, i64* %scevgep
> > + %indvar.next = add nsw i64 %indvar, 1
> > + %exitcond = icmp eq i64 %indvar.next, %N
> > + br i1 %exitcond, label %return, label %for.i
> > +
> > +return:
> > + fence seq_cst
> > + ret void
> > +}
> > +
> > +; CHECK: @__polly_perf_cycles_total_start = weak
> > thread_local(initialexec) constant i64 0
> > +; CHECK-NEXT: @__polly_perf_initialized = weak thread_local(initialexec)
> > constant i1 false
> > +; CHECK-NEXT: @__polly_perf_cycles_in_scops = weak
> > thread_local(initialexec) constant i64 0
> > +; CHECK-NEXT: @__polly_perf_cycles_in_scop_start = weak
> > thread_local(initialexec) constant i64 0
> > +; CHECK-NEXT: @__polly_perf_write_loation = weak
> > thread_local(initialexec) constant i32 0
> > +
> > +; CHECK: polly.split_new_and_old: ; preds =
> > %entry
> > +; CHECK-NEXT: %0 = call i64 @llvm.x86.rdtscp(i8* bitcast (i32*
> > @__polly_perf_write_loation to i8*))
> > +; CHECK-NEXT: store volatile i64 %0, i64* @__polly_perf_cycles_in_scop_
> > start
> > +
> > +; CHECK: polly.merge_new_and_old: ; preds =
> > %polly.exiting, %return.region_exiting
> > +; CHECK-NEXT: %5 = load volatile i64, i64* @__polly_perf_cycles_in_scop_
> > start
> > +; CHECK-NEXT: %6 = call i64 @llvm.x86.rdtscp(i8* bitcast (i32*
> > @__polly_perf_write_loation to i8*))
> > +; CHECK-NEXT: %7 = sub i64 %6, %5
> > +; CHECK-NEXT: %8 = load volatile i64, i64* @__polly_perf_cycles_in_scops
> > +; CHECK-NEXT: %9 = add i64 %8, %7
> > +; CHECK-NEXT: store volatile i64 %9, i64* @__polly_perf_cycles_in_scops
> > +; CHECK-NEXT: br label %return
> > +
> > +
> > +; CHECK: define weak_odr void @__polly_perf_final() {
> > +; CHECK-NEXT: start:
> > +; CHECK-NEXT: %0 = call i64 @llvm.x86.rdtscp(i8* bitcast (i32*
> > @__polly_perf_write_loation to i8*))
> > +; CHECK-NEXT: %1 = load volatile i64, i64* @__polly_perf_cycles_total_
> > start
> > +; CHECK-NEXT: %2 = sub i64 %0, %1
> > +; CHECK-NEXT: %3 = load volatile i64, i64* @__polly_perf_cycles_in_scops
> > +; CHECK-NEXT: %4 = call i32 (...) @printf(i8* getelementptr inbounds
> > ([3 x i8], [3 x i8]* @1, i32 0, i32 0), i8 addrspace(4)* getelementptr
> > inbounds ([27 x i8], [27 x i8] addrspace(4)* @0, i32 0, i32 0))
> > +; CHECK-NEXT: %5 = call i32 @fflush(i8* null)
> > +; CHECK-NEXT: %6 = call i32 (...) @printf(i8* getelementptr inbounds
> > ([3 x i8], [3 x i8]* @3, i32 0, i32 0), i8 addrspace(4)* getelementptr
> > inbounds ([27 x i8], [27 x i8] addrspace(4)* @2, i32 0, i32 0))
> > +; CHECK-NEXT: %7 = call i32 @fflush(i8* null)
> > +; CHECK-NEXT: %8 = call i32 (...) @printf(i8* getelementptr inbounds
> > ([8 x i8], [8 x i8]* @6, i32 0, i32 0), i8 addrspace(4)* getelementptr
> > inbounds ([8 x i8], [8 x i8] addrspace(4)* @4, i32 0, i32 0), i64 %2, i8
> > addrspace(4)* getelementptr inbounds ([2 x i8], [2 x i8] addrspace(4)* @5,
> > i32 0, i32 0))
> > +; CHECK-NEXT: %9 = call i32 @fflush(i8* null)
> > +; CHECK-NEXT: %10 = call i32 (...) @printf(i8* getelementptr inbounds
> > ([8 x i8], [8 x i8]* @9, i32 0, i32 0), i8 addrspace(4)* getelementptr
> > inbounds ([8 x i8], [8 x i8] addrspace(4)* @7, i32 0, i32 0), i64 %3, i8
> > addrspace(4)* getelementptr inbounds ([2 x i8], [2 x i8] addrspace(4)* @8,
> > i32 0, i32 0))
> > +; CHECK-NEXT: %11 = call i32 @fflush(i8* null)
> > +; CHECK-NEXT: ret void
> > +; CHECK-NEXT: }
> > +
> > +
> > +; CHECK: define weak_odr void @__polly_perf_init() {
> > +; CHECK-NEXT: start:
> > +; CHECK-NEXT: %0 = load i1, i1* @__polly_perf_initialized
> > +; CHECK-NEXT: br i1 %0, label %earlyreturn, label %initbb
> > +
> > +; CHECK: earlyreturn: ; preds =
> > %start
> > +; CHECK-NEXT: ret void
> > +
> > +; CHECK: initbb: ; preds =
> > %start
> > +; CHECK-NEXT: store i1 true, i1* @__polly_perf_initialized
> > +; CHECK-NEXT: %1 = call i32 @atexit(i8* bitcast (void ()*
> > @__polly_perf_final to i8*))
> > +; CHECK-NEXT: %2 = call i64 @llvm.x86.rdtscp(i8* bitcast (i32*
> > @__polly_perf_write_loation to i8*))
> > +; CHECK-NEXT: store volatile i64 %2, i64* @__polly_perf_cycles_total_
> > start
> > +; CHECK-NEXT: ret void
> > +; CHECK-NEXT: }
> >
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
> >
More information about the llvm-commits
mailing list