[LLVMdev] MCJIT generating loads of just-stored constants
martin krastev
blu.dark at gmail.com
Thu Feb 26 00:54:32 PST 2015
Hello,
I end up with the following IR, exhibiting an apparent missed
optimisation opportunity, namely loading of just-stored constants:
...
%5 = getelementptr inbounds %class.A* %self, i32 0, i32 9, i32 0
store i32 1, i32* %5, align 4
%6 = getelementptr inbounds %class.A* %self, i32 0, i32 9, i32 1
store i32 1, i32* %6, align 4
%7 = getelementptr inbounds %class.A* %self, i32 0, i32 9, i32 2
store i32 0, i32* %7, align 4
%8 = getelementptr inbounds %class.A* %self, i32 0, i32 9, i32 6
store i32 2, i32* %8, align 4
%9 = getelementptr inbounds %class.A* %self, i32 0, i32 9, i32 8
store i32 2, i32* %9, align 4
%10 = getelementptr inbounds %class.A* %self, i32 0, i32 9, i32 10
store i32 16, i32* %10, align 4
%11 = getelementptr inbounds %class.A* %self, i32 0, i32 9, i32 11
store i32 16, i32* %11, align 4
%12 = getelementptr inbounds %class.A* %self, i32 0, i32 9, i32 12
store i32 0, i32* %12, align 4
%13 = getelementptr inbounds %class.A* %self, i32 0, i32 9, i32 13
store i32 0, i32* %13, align 4
%14 = getelementptr inbounds %class.A* %self, i32 0, i32 9, i32 15
store i32 8, i32* %14, align 4
%15 = getelementptr inbounds %class.A* %self, i32 0, i32 9, i32 17
store i32 0, i32* %15, align 4
%16 = getelementptr inbounds %class.A* %self, i64 0, i32 9, i32 0
%17 = load i32* %16, align 4
%18 = getelementptr inbounds %class.A* %self, i64 0, i32 9, i32 3
%19 = load float* %18, align 4
%20 = getelementptr inbounds %class.A* %self, i64 0, i32 9, i32 4
%21 = load float* %20, align 4
%22 = getelementptr inbounds %class.A* %self, i64 0, i32 9, i32 5
%23 = load float* %22, align 4
%24 = getelementptr inbounds %class.A* %self, i64 0, i32 9, i32 6
%25 = load i32* %24, align 4
%26 = getelementptr inbounds %class.A* %self, i64 0, i32 9, i32 7
%27 = load float* %26, align 4
%28 = getelementptr inbounds %class.A* %self, i64 0, i32 9, i32 8
%29 = load i32* %28, align 4
%30 = getelementptr inbounds %class.A* %self, i64 0, i32 9, i32 9
%31 = load float* %30, align 4
%32 = getelementptr inbounds %class.A* %self, i64 0, i32 9, i32 10
%33 = load i32* %32, align 4
%34 = getelementptr inbounds %class.A* %self, i64 0, i32 9, i32 11
%35 = load i32* %34, align 4
%36 = getelementptr inbounds %class.A* %self, i64 0, i32 9, i32 13
%37 = load i32* %36, align 4
%38 = getelementptr inbounds %class.A* %self, i64 0, i32 9, i32 14
%39 = load float* %38, align 4
%40 = getelementptr inbounds %class.A* %self, i64 0, i32 9, i32 15
%41 = load i32* %40, align 4
%42 = getelementptr inbounds %class.A* %self, i64 0, i32 9, i32 16
%43 = load float* %42, align 4
...
The above happens after a callee gets inlined - all the stores are
from the caller, and the loads are from the inlined callee. Please
note the partial overlap between stored and loaded fields.
The general steps leading to the above:
1. Load a module containing a function A::foo(), which function starts
with reading fields from an object of class A.
2. Add to the module a wrapper function bar() which takes as an
argument an object of class A, stores literals to (most of the) fields
of the object, then calls A::foo() with the same object.
3. Update the original A::foo() with an AlwaysInline attribute.
4. Pass the module to MCJIT from clang 3.4.2, set up as:
...
llvm::PassRegistry ®istry =
*llvm::PassRegistry::getPassRegistry();
llvm::initializeCore(registry);
llvm::initializeScalarOpts(registry);
llvm::initializeObjCARCOpts(registry);
llvm::initializeVectorization(registry);
llvm::initializeIPO(registry);
llvm::initializeAnalysis(registry);
llvm::initializeIPA(registry);
llvm::initializeTransformUtils(registry);
llvm::initializeInstCombine(registry);
llvm::initializeTarget(registry);
llvm::initializeCodeGen(registry);
llvm::initializeLoopStrengthReducePass(registry);
llvm::initializeLowerIntrinsicsPass(registry);
llvm::initializeUnreachableBlockElimPass(registry);
llvm::TargetOptions opt;
opt.PositionIndependentExecutable = false;
const std::string& triple = llvm::sys::getProcessTriple();
const std::string& hostcpu = llvm::sys::getHostCPUName();
const std::string& features = "";
std::string error;
const llvm::Target *const target =
llvm::TargetRegistry::lookupTarget(triple, error);
llvm::TargetMachine *const tm = target->createTargetMachine(
triple, hostcpu, features, opt,
llvm::Reloc::Default,
llvm::CodeModel::JITDefault,
llvm::CodeGenOpt::Aggressive);
// Set up IR pass management
llvm::FunctionPassManager fpm(module);
llvm::PassManager pm;
tm->addAnalysisPasses(pm);
tm->addAnalysisPasses(fpm);
// Use a pass manager builder for C-style optimisations
llvm::PassManagerBuilder passBuilder;
passBuilder.OptLevel = 3;
passBuilder.SizeLevel = 0;
passBuilder.Inliner =
llvm::createAlwaysInlinerPass(false); // suppress llvm.lifetime.*
intrinsics
passBuilder.BBVectorize = true;
passBuilder.SLPVectorize = true;
passBuilder.LoopVectorize = true;
passBuilder.LateVectorize = true;
passBuilder.populateFunctionPassManager(fpm);
passBuilder.populateModulePassManager(pm);
fpm.doInitialization();
for (llvm::Module::iterator it = module->begin(),
endit = module->end(); it != endit; ++it) {
fpm.run(*it);
}
fpm.doFinalization();
pm.run(*module);
execEngine =
llvm::EngineBuilder(module).setEngineKind(llvm::EngineKind::JIT).setUseMCJIT(true).create(tm);
execEngine->finalizeObject();
...
I guess there's something apparent I'm missing from the MCJIT setup in
order to get these results. Any hits are greatly appreciated.
Regards,
Martin
More information about the llvm-dev
mailing list