[PATCH] Speed up creation of live ranges for physical registers by using a segment set

Vaidas Gasiunas vaidas.gasiunas at sap.com
Tue Nov 4 06:49:25 PST 2014


Hi Eric and Quentin,

Thanks for the comments! At first, I will give some info on performance. 

To measure compliation time I used an example generated by the script from http://llvm.org/bugs/show_bug.cgi?id=18580 with the parameter 10000. This generates a C program with 10000 string objects and 10000 small if statements using these variables. The C program is compiled to IR by clang. This IR I then use to measure the compilation time of llc. I run llc with -01 to skip the optimization passes. 

So with this example, the original llc runs about 19s and the llc with the patch runs about 10s. 

Here is the result of profiling the llc without the patch (exported as CSV from vtune):
----------------------------------------------------------------------
"Function Stack","CPU Time: Total by Utilization","CPU Time: Self by Utilization","Overhead and Spin Time: Total","Overhead and Spin Time: Self","Module","Function (Full)","Source File","Start Address"
"      llvm::FPPassManager::runOnFunction","18.55","0","0","0","llc-no","llvm::FPPassManager::runOnFunction(llvm::Function&)","","0x1366d50"
"       llvm::LiveIntervals::runOnMachineFunction","9.07199","0","0","0","llc-no","llvm::LiveIntervals::runOnMachineFunction(llvm::MachineFunction&)","","0xe8cb00"
"        llvm::LiveIntervals::computeLiveInRegUnits","8.95999","0","0","0","llc-no","llvm::LiveIntervals::computeLiveInRegUnits(void)","","0xe8a1a0"
"         llvm::LiveIntervals::computeRegUnitRange","8.95999","0","0","0","llc-no","llvm::LiveIntervals::computeRegUnitRange(llvm::LiveRange&, unsigned int)","","0xe89fd0"
"          llvm::LiveRangeCalc::extendToUses","4.77199","0.0180017","0","0","llc-no","llvm::LiveRangeCalc::extendToUses(llvm::LiveRange&, unsigned int)","","0xe94640"
"           llvm::LiveRangeCalc::extend","4.75399","0.00799807","0","0","llc-no","llvm::LiveRangeCalc::extend(llvm::LiveRange&, llvm::SlotIndex, unsigned int)","","0xe94420"
"            llvm::LiveRange::extendInBlock","4.74599","0.0199952","0","0","llc-no","llvm::LiveRange::extendInBlock(llvm::SlotIndex, llvm::SlotIndex)","","0xe85f60"
"             llvm::LiveRange::extendSegmentEndTo","4.726","4.726","0","0","llc-no","llvm::LiveRange::extendSegmentEndTo(llvm::LiveRange::Segment*, llvm::SlotIndex)","","0xe84a30"
"          llvm::LiveRangeCalc::createDeadDefs","4.18799","0.0279962","0","0","llc-no","llvm::LiveRangeCalc::createDeadDefs(llvm::LiveRange&, unsigned int)","","0xe91650"
"           llvm::LiveRange::createDeadDef","4.16","4.128","0","0","llc-no","llvm::LiveRange::createDeadDef(llvm::SlotIndex, llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, (unsigned long)4096, (unsigned long)4096>&)","","0xe880a0"
"            llvm::LiveRange::find","0.0319956","0.0319956","0","0","llc-no","llvm::LiveRange::find(llvm::SlotIndex)","","0xe84530"
"        llvm::LiveIntervals::computeVirtRegs","0.0920017","0","0","0","llc-no","llvm::LiveIntervals::computeVirtRegs(void)","","0xe8c9e0"
"         llvm::LiveIntervals::computeVirtRegInterval","0.0840081","0","0","0","llc-no","llvm::LiveIntervals::computeVirtRegInterval(llvm::LiveInterval&)","","0xe8a6a0"
"          llvm::LiveRangeCalc::reset","0.0440009","0.0440009","0","0","llc-no","llvm::LiveRangeCalc::reset(llvm::MachineFunction const*, llvm::SlotIndexes*, llvm::MachineDominatorTree*, llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, (unsigned long)4096, (unsigned long)4096>*)","","0xe94ba0"
"          llvm::LiveRangeCalc::extendToUses","0.0320052","0.0320052","0","0","llc-no","llvm::LiveRangeCalc::extendToUses(llvm::LiveRange&, unsigned int)","","0xe94640"
"          llvm::LiveRangeCalc::createDeadDefs","0.00800203","0","0","0","llc-no","llvm::LiveRangeCalc::createDeadDefs(llvm::LiveRange&, unsigned int)","","0xe91650"
"         llvm::LiveIntervals::createInterval","0.00799364","0.00799364","0","0","llc-no","llvm::LiveIntervals::createInterval(unsigned int)","","0xe89f40"
"        llvm::LiveIntervals::computeRegMasks","0.02","0.02","0","0","llc-no","llvm::LiveIntervals::computeRegMasks(void)","","0xe8b720"
"       llvm::SelectionDAGISel::runOnMachineFunction","4.08","0.0160085","0","0","llc-no","llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&)","","0xd60d50"
"       (anonymous namespace)::MachineBlockPlacement::runOnMachineFunction","1.77799","0","0","0","llc-no","(anonymous namespace)::MachineBlockPlacement::runOnMachineFunction(llvm::MachineFunction&)","","0xebb050"
"       llvm::LiveVariables::runOnMachineFunction","0.370002","0","0","0","llc-no","llvm::LiveVariables::runOnMachineFunction(llvm::MachineFunction&)","","0xea1610"
----------------------------------------------------------------------

Here is the profiler result of the llc with the patch:
----------------------------------------------------------------------
"Function Stack","CPU Time: Total by Utilization","CPU Time: Self by Utilization","Overhead and Spin Time: Total","Overhead and Spin Time: Self","Module","Function (Full)","Source File","Start Address"
"      llvm::FPPassManager::runOnFunction","9.72","0","0","0","llc-pr","llvm::FPPassManager::runOnFunction(llvm::Function&)","","0x1367190"
"       llvm::SelectionDAGISel::runOnMachineFunction","4.08","0.0320187","0","0","llc-pr","llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&)","","0xd60e80"
"       (anonymous namespace)::MachineBlockPlacement::runOnMachineFunction","1.57","0","0","0","llc-pr","(anonymous namespace)::MachineBlockPlacement::runOnMachineFunction(llvm::MachineFunction&)","","0xebb1e0"
"       llvm::LiveVariables::runOnMachineFunction","0.378","0.0360253","0","0","llc-pr","llvm::LiveVariables::runOnMachineFunction(llvm::MachineFunction&)","","0xea17a0"
"       (anonymous namespace)::VerifierLegacyPass::runOnFunction","0.29999","0.0160001","0","0","llc-pr","(anonymous namespace)::VerifierLegacyPass::runOnFunction(llvm::Function&)","","0x1392df0"
"       llvm::LiveIntervals::runOnMachineFunction","0.267978","0","0","0","llc-pr","llvm::LiveIntervals::runOnMachineFunction(llvm::MachineFunction&)","","0xe8cc70"
"        llvm::LiveIntervals::computeLiveInRegUnits","0.159983","0","0","0","llc-pr","llvm::LiveIntervals::computeLiveInRegUnits(void)","","0xe8a2e0"
"        llvm::LiveIntervals::computeVirtRegs","0.0879867","0.0120113","0","0","llc-pr","llvm::LiveIntervals::computeVirtRegs(void)","","0xe8cb50"
"        llvm::LiveIntervals::computeRegMasks","0.0200084","0.0200084","0","0","llc-pr","llvm::LiveIntervals::computeRegMasks(void)","","0xe8b890"
"       (anonymous namespace)::MachineScheduler::runOnMachineFunction","0.25997","0","0","0","llc-pr","(anonymous namespace)::MachineScheduler::runOnMachineFunction(llvm::MachineFunction&)","","0xf074f0"
"       llvm::MachineDominatorTree::runOnMachineFunction","0.255892","0","0","0","llc-pr","llvm::MachineDominatorTree::runOnMachineFunction(llvm::MachineFunction&)","","0xed3ca0"
"       llvm::X86AsmPrinter::runOnMachineFunction","0.239961","0","0","0","llc-pr","llvm::X86AsmPrinter::runOnMachineFunction(llvm::MachineFunction&)","","0xb43390"
"       (anonymous namespace)::MachineCSE::PerformCSE","0.219981","0.0440057","0","0","llc-pr","(anonymous namespace)::MachineCSE::PerformCSE(llvm::DomTreeNodeBase<llvm::MachineBasicBlock>*)","","0xec0620"
"       (anonymous namespace)::RAGreedy::runOnMachineFunction","0.181984","0","0","0","llc-pr","(anonymous namespace)::RAGreedy::runOnMachineFunction(llvm::MachineFunction&)","","0xf4c440"
"       llvm::SlotIndexes::runOnMachineFunction","0.144008","0.144008","0","0","llc-pr","llvm::SlotIndexes::runOnMachineFunction(llvm::MachineFunction&)","","0xf7f3f0"
"       (anonymous namespace)::RegisterCoalescer::runOnMachineFunction","0.130013","0.0520054","0","0","llc-pr","(anonymous namespace)::RegisterCoalescer::runOnMachineFunction(llvm::MachineFunction&)","","0xf60390"
"       (anonymous namespace)::StackColoring::runOnMachineFunction","0.128002","0.00800791","0","0","llc-pr","(anonymous namespace)::StackColoring::runOnMachineFunction(llvm::MachineFunction&)","","0xf914b0"
----------------------------------------------------------------------

As you see, the optimization reduces the time of the LiveIntervals pass from 9s to 0.27s.

If I also activate the segment set in computeVirtRegInterval, then the time of the LiveIntervals pass goes slightly up to 0.32s. This difference is of course insignificant compared to the total time, but since this optimization is optional anyway, I don't see a reason why should we activate in computeVirtRegInterval. 

Here is the profiler result for llc with the patch + segment set activated computeVirtRegInterval:
----------------------------------------------------------------------
"Function Stack","CPU Time: Total by Utilization","CPU Time: Self by Utilization","Overhead and Spin Time: Total","Overhead and Spin Time: Self","Module","Function (Full)","Source File","Start Address"
"       llvm::LiveIntervals::runOnMachineFunction","0.319991","0","0","0","llc-vr","llvm::LiveIntervals::runOnMachineFunction(llvm::MachineFunction&)","","0xe8cc80"
"        llvm::LiveIntervals::computeLiveInRegUnits","0.167986","0","0","0","llc-vr","llvm::LiveIntervals::computeLiveInRegUnits(void)","","0xe8a2e0"
"         llvm::LiveIntervals::computeRegUnitRange","0.147984","0","0","0","llc-vr","llvm::LiveIntervals::computeRegUnitRange(llvm::LiveRange&, unsigned int)","","0xe8a110"
"         llvm::LiveRange::flushSegmentSet","0.0120047","0.0120047","0","0","llc-vr","llvm::LiveRange::flushSegmentSet(void)","","0xe86810"
"         llvm::LiveRange::createDeadDef","0.00799748","0","0","0","llc-vr","llvm::LiveRange::createDeadDef(llvm::SlotIndex, llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, (unsigned long)4096, (unsigned long)4096>&)","","0xe881e0"
"        llvm::LiveIntervals::computeVirtRegs","0.131992","0.0119954","0","0","llc-vr","llvm::LiveIntervals::computeVirtRegs(void)","","0xe8cb60"
"         llvm::LiveIntervals::computeVirtRegInterval","0.0920111","0","0","0","llc-vr","llvm::LiveIntervals::computeVirtRegInterval(llvm::LiveInterval&)","","0xe8a810"
"          llvm::LiveRangeCalc::createDeadDefs","0.0560275","0.0560275","0","0","llc-vr","llvm::LiveRangeCalc::createDeadDefs(llvm::LiveRange&, unsigned int)","","0xe917f0"
"          llvm::LiveRangeCalc::extendToUses","0.0199875","0.00799175","0","0","llc-vr","llvm::LiveRangeCalc::extendToUses(llvm::LiveRange&, unsigned int)","","0xe947e0"
"          llvm::LiveRangeCalc::reset","0.015996","0.015996","0","0","llc-vr","llvm::LiveRangeCalc::reset(llvm::MachineFunction const*, llvm::SlotIndexes*, llvm::MachineDominatorTree*, llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, (unsigned long)4096, (unsigned long)4096>*)","","0xe94d40"
"         llvm::LiveRange::flushSegmentSet","0.0199879","0.0199879","0","0","llc-vr","llvm::LiveRange::flushSegmentSet(void)","","0xe86810"
"         llvm::LiveIntervals::createInterval","0.00799796","0.00799796","0","0","llc-vr","llvm::LiveIntervals::createInterval(unsigned int)","","0xe8a080"
"        llvm::LiveIntervals::computeRegMasks","0.0200124","0.0200124","0","0","llc-vr","llvm::LiveIntervals::computeRegMasks(void)","","0xe8b8a0"
----------------------------------------------------------------------

My explanation of the performance differences is that the optimization is beneficial for very large live ranges with thousands of segments, because then insertion of new segments into the middle of the array is very expensive. For small live ranges using array is slightly more efficient that using set. However, in these cases LiveIntervals pass does not take much time anyway.

Vaidas

http://reviews.llvm.org/D6013






More information about the llvm-commits mailing list